eric6/DebugClients/Python/coverage/phystokens.py

Tue, 02 Mar 2021 17:17:09 +0100

author
Detlev Offenbach <detlev@die-offenbachs.de>
date
Tue, 02 Mar 2021 17:17:09 +0100
changeset 8143
2c730d5fd177
parent 7427
362cd1b6f81a
permissions
-rw-r--r--

Changed the use of PyQt enums because the way they were used previously is deprecated since two years and replaced some deprecated Qt stuff.

4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
1 # Licensed under the Apache License: http://www.apache.org/licenses/LICENSE-2.0
7427
362cd1b6f81a coverage: updated coverage.py to 5.0.3.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
2 # For details: https://github.com/nedbat/coveragepy/blob/master/NOTICE.txt
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
3
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
4 """Better tokenizing for coverage.py."""
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
5
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
6 import codecs
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
7 import keyword
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
8 import re
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
9 import sys
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
10 import token
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
11 import tokenize
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
12
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
13 from coverage import env
6219
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
14 from coverage.backward import iternext, unicode_class
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
15 from coverage.misc import contract
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
16
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
17
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
18 def phys_tokens(toks):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
19 """Return all physical tokens, even line continuations.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
20
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
21 tokenize.generate_tokens() doesn't return a token for the backslash that
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
22 continues lines. This wrapper provides those tokens so that we can
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
23 re-create a faithful representation of the original source.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
24
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
25 Returns the same values as generate_tokens()
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
26
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
27 """
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
28 last_line = None
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
29 last_lineno = -1
7427
362cd1b6f81a coverage: updated coverage.py to 5.0.3.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
30 last_ttext = None
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
31 for ttype, ttext, (slineno, scol), (elineno, ecol), ltext in toks:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
32 if last_lineno != elineno:
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
33 if last_line and last_line.endswith("\\\n"):
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
34 # We are at the beginning of a new line, and the last line
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
35 # ended with a backslash. We probably have to inject a
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
36 # backslash token into the stream. Unfortunately, there's more
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
37 # to figure out. This code::
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
38 #
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
39 # usage = """\
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
40 # HEY THERE
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
41 # """
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
42 #
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
43 # triggers this condition, but the token text is::
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
44 #
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
45 # '"""\\\nHEY THERE\n"""'
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
46 #
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
47 # so we need to figure out if the backslash is already in the
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
48 # string token or not.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
49 inject_backslash = True
7427
362cd1b6f81a coverage: updated coverage.py to 5.0.3.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
50 if last_ttext.endswith("\\"):
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
51 inject_backslash = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
52 elif ttype == token.STRING:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
53 if "\n" in ttext and ttext.split('\n', 1)[0][-1] == '\\':
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
54 # It's a multi-line string and the first line ends with
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
55 # a backslash, so we don't need to inject another.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
56 inject_backslash = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
57 if inject_backslash:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
58 # Figure out what column the backslash is in.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
59 ccol = len(last_line.split("\n")[-2]) - 1
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
60 # Yield the token, with a fake token type.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
61 yield (
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
62 99999, "\\\n",
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
63 (slineno, ccol), (slineno, ccol+2),
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
64 last_line
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
65 )
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
66 last_line = ltext
7427
362cd1b6f81a coverage: updated coverage.py to 5.0.3.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
67 if ttype not in (tokenize.NEWLINE, tokenize.NL):
362cd1b6f81a coverage: updated coverage.py to 5.0.3.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
68 last_ttext = ttext
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
69 yield ttype, ttext, (slineno, scol), (elineno, ecol), ltext
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
70 last_lineno = elineno
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
71
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
72
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
73 @contract(source='unicode')
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
74 def source_token_lines(source):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
75 """Generate a series of lines, one for each line in `source`.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
76
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
77 Each line is a list of pairs, each pair is a token::
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
78
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
79 [('key', 'def'), ('ws', ' '), ('nam', 'hello'), ('op', '('), ... ]
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
80
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
81 Each pair has a token class, and the token text.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
82
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
83 If you concatenate all the token texts, and then join them with newlines,
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
84 you should have your original `source` back, with two differences:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
85 trailing whitespace is not preserved, and a final line with no newline
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
86 is indistinguishable from a final line with a newline.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
87
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
88 """
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
89
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
90 ws_tokens = set([token.INDENT, token.DEDENT, token.NEWLINE, tokenize.NL])
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
91 line = []
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
92 col = 0
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
93
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
94 source = source.expandtabs(8).replace('\r\n', '\n')
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
95 tokgen = generate_tokens(source)
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
96
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
97 for ttype, ttext, (_, scol), (_, ecol), _ in phys_tokens(tokgen):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
98 mark_start = True
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
99 for part in re.split('(\n)', ttext):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
100 if part == '\n':
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
101 yield line
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
102 line = []
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
103 col = 0
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
104 mark_end = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
105 elif part == '':
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
106 mark_end = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
107 elif ttype in ws_tokens:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
108 mark_end = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
109 else:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
110 if mark_start and scol > col:
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
111 line.append(("ws", u" " * (scol - col)))
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
112 mark_start = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
113 tok_class = tokenize.tok_name.get(ttype, 'xx').lower()[:3]
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
114 if ttype == token.NAME and keyword.iskeyword(ttext):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
115 tok_class = "key"
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
116 line.append((tok_class, part))
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
117 mark_end = True
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
118 scol = 0
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
119 if mark_end:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
120 col = ecol
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
121
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
122 if line:
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
123 yield line
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
124
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
125
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
126 class CachedTokenizer(object):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
127 """A one-element cache around tokenize.generate_tokens.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
128
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
129 When reporting, coverage.py tokenizes files twice, once to find the
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
130 structure of the file, and once to syntax-color it. Tokenizing is
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
131 expensive, and easily cached.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
132
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
133 This is a one-element cache so that our twice-in-a-row tokenizing doesn't
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
134 actually tokenize twice.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
135
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
136 """
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
137 def __init__(self):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
138 self.last_text = None
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
139 self.last_tokens = None
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
140
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
141 @contract(text='unicode')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
142 def generate_tokens(self, text):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
143 """A stand-in for `tokenize.generate_tokens`."""
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
144 if text != self.last_text:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
145 self.last_text = text
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
146 readline = iternext(text.splitlines(True))
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
147 self.last_tokens = list(tokenize.generate_tokens(readline))
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
148 return self.last_tokens
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
149
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
150 # Create our generate_tokens cache as a callable replacement function.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
151 generate_tokens = CachedTokenizer().generate_tokens
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
152
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
153
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
154 COOKIE_RE = re.compile(r"^[ \t]*#.*coding[:=][ \t]*([-\w.]+)", flags=re.MULTILINE)
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
155
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
156 @contract(source='bytes')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
157 def _source_encoding_py2(source):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
158 """Determine the encoding for `source`, according to PEP 263.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
159
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
160 `source` is a byte string, the text of the program.
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
161
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
162 Returns a string, the name of the encoding.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
163
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
164 """
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
165 assert isinstance(source, bytes)
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
166
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
167 # Do this so the detect_encode code we copied will work.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
168 readline = iternext(source.splitlines(True))
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
169
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
170 # This is mostly code adapted from Py3.2's tokenize module.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
171
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
172 def _get_normal_name(orig_enc):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
173 """Imitates get_normal_name in tokenizer.c."""
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
174 # Only care about the first 12 characters.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
175 enc = orig_enc[:12].lower().replace("_", "-")
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
176 if re.match(r"^utf-8($|-)", enc):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
177 return "utf-8"
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
178 if re.match(r"^(latin-1|iso-8859-1|iso-latin-1)($|-)", enc):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
179 return "iso-8859-1"
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
180 return orig_enc
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
181
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
182 # From detect_encode():
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
183 # It detects the encoding from the presence of a UTF-8 BOM or an encoding
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
184 # cookie as specified in PEP-0263. If both a BOM and a cookie are present,
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
185 # but disagree, a SyntaxError will be raised. If the encoding cookie is an
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
186 # invalid charset, raise a SyntaxError. Note that if a UTF-8 BOM is found,
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
187 # 'utf-8-sig' is returned.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
188
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
189 # If no encoding is specified, then the default will be returned.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
190 default = 'ascii'
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
191
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
192 bom_found = False
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
193 encoding = None
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
194
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
195 def read_or_stop():
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
196 """Get the next source line, or ''."""
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
197 try:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
198 return readline()
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
199 except StopIteration:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
200 return ''
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
201
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
202 def find_cookie(line):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
203 """Find an encoding cookie in `line`."""
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
204 try:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
205 line_string = line.decode('ascii')
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
206 except UnicodeDecodeError:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
207 return None
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
208
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
209 matches = COOKIE_RE.findall(line_string)
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
210 if not matches:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
211 return None
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
212 encoding = _get_normal_name(matches[0])
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
213 try:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
214 codec = codecs.lookup(encoding)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
215 except LookupError:
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
216 # This behavior mimics the Python interpreter
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
217 raise SyntaxError("unknown encoding: " + encoding)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
218
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
219 if bom_found:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
220 # codecs in 2.3 were raw tuples of functions, assume the best.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
221 codec_name = getattr(codec, 'name', encoding)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
222 if codec_name != 'utf-8':
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
223 # This behavior mimics the Python interpreter
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
224 raise SyntaxError('encoding problem: utf-8')
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
225 encoding += '-sig'
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
226 return encoding
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
227
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
228 first = read_or_stop()
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
229 if first.startswith(codecs.BOM_UTF8):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
230 bom_found = True
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
231 first = first[3:]
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
232 default = 'utf-8-sig'
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
233 if not first:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
234 return default
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
235
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
236 encoding = find_cookie(first)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
237 if encoding:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
238 return encoding
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
239
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
240 second = read_or_stop()
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
241 if not second:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
242 return default
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
243
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
244 encoding = find_cookie(second)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
245 if encoding:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
246 return encoding
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
247
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
248 return default
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
249
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
250
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
251 @contract(source='bytes')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
252 def _source_encoding_py3(source):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
253 """Determine the encoding for `source`, according to PEP 263.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
254
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
255 `source` is a byte string: the text of the program.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
256
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
257 Returns a string, the name of the encoding.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
258
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
259 """
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
260 readline = iternext(source.splitlines(True))
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
261 return tokenize.detect_encoding(readline)[0]
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
262
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
263
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
264 if env.PY3:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
265 source_encoding = _source_encoding_py3
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
266 else:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
267 source_encoding = _source_encoding_py2
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
268
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
269
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
270 @contract(source='unicode')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
271 def compile_unicode(source, filename, mode):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
272 """Just like the `compile` builtin, but works on any Unicode string.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
273
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
274 Python 2's compile() builtin has a stupid restriction: if the source string
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
275 is Unicode, then it may not have a encoding declaration in it. Why not?
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
276 Who knows! It also decodes to utf8, and then tries to interpret those utf8
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
277 bytes according to the encoding declaration. Why? Who knows!
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
278
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
279 This function neuters the coding declaration, and compiles it.
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
280
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
281 """
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
282 source = neuter_encoding_declaration(source)
6219
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
283 if env.PY2 and isinstance(filename, unicode_class):
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
284 filename = filename.encode(sys.getfilesystemencoding(), "replace")
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
285 code = compile(source, filename, mode)
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
286 return code
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
287
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
288
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
289 @contract(source='unicode', returns='unicode')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
290 def neuter_encoding_declaration(source):
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
291 """Return `source`, with any encoding declaration neutered."""
6219
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
292 if COOKIE_RE.search(source):
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
293 source_lines = source.splitlines(True)
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
294 for lineno in range(min(2, len(source_lines))):
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
295 source_lines[lineno] = COOKIE_RE.sub("# (deleted declaration)", source_lines[lineno])
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
296 source = "".join(source_lines)
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
297 return source

eric ide

mercurial