DebugClients/Python/coverage/phystokens.py

Sat, 10 Oct 2015 12:44:52 +0200

author
Detlev Offenbach <detlev@die-offenbachs.de>
date
Sat, 10 Oct 2015 12:44:52 +0200
changeset 4491
0d8612e24fef
parent 4489
d0d6e4ad31bd
child 5051
3586ebd9fac8
permissions
-rw-r--r--

Merged with coverage.py update.

4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
1 # Licensed under the Apache License: http://www.apache.org/licenses/LICENSE-2.0
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
2 # For details: https://bitbucket.org/ned/coveragepy/src/default/NOTICE.txt
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
3
31
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
4 """Better tokenizing for coverage.py."""
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
5
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
6 import codecs
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
7 import keyword
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
8 import re
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
9 import token
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
10 import tokenize
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
11
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
12 from coverage import env
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
13 from coverage.backward import iternext
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
14 from coverage.misc import contract
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
15
31
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
16
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
17 def phys_tokens(toks):
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
18 """Return all physical tokens, even line continuations.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
19
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
20 tokenize.generate_tokens() doesn't return a token for the backslash that
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
21 continues lines. This wrapper provides those tokens so that we can
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
22 re-create a faithful representation of the original source.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
23
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
24 Returns the same values as generate_tokens()
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
25
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
26 """
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
27 last_line = None
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
28 last_lineno = -1
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
29 last_ttype = None
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
30 for ttype, ttext, (slineno, scol), (elineno, ecol), ltext in toks:
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
31 if last_lineno != elineno:
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
32 if last_line and last_line.endswith("\\\n"):
31
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
33 # We are at the beginning of a new line, and the last line
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
34 # ended with a backslash. We probably have to inject a
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
35 # backslash token into the stream. Unfortunately, there's more
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
36 # to figure out. This code::
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
37 #
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
38 # usage = """\
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
39 # HEY THERE
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
40 # """
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
41 #
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
42 # triggers this condition, but the token text is::
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
43 #
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
44 # '"""\\\nHEY THERE\n"""'
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
45 #
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
46 # so we need to figure out if the backslash is already in the
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
47 # string token or not.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
48 inject_backslash = True
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
49 if last_ttype == tokenize.COMMENT:
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
50 # Comments like this \
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
51 # should never result in a new token.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
52 inject_backslash = False
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
53 elif ttype == token.STRING:
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
54 if "\n" in ttext and ttext.split('\n', 1)[0][-1] == '\\':
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
55 # It's a multi-line string and the first line ends with
31
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
56 # a backslash, so we don't need to inject another.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
57 inject_backslash = False
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
58 if inject_backslash:
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
59 # Figure out what column the backslash is in.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
60 ccol = len(last_line.split("\n")[-2]) - 1
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
61 # Yield the token, with a fake token type.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
62 yield (
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
63 99999, "\\\n",
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
64 (slineno, ccol), (slineno, ccol+2),
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
65 last_line
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
66 )
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
67 last_line = ltext
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
68 last_ttype = ttype
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
69 yield ttype, ttext, (slineno, scol), (elineno, ecol), ltext
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
70 last_lineno = elineno
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
71
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
72
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
73 @contract(source='unicode')
31
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
74 def source_token_lines(source):
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
75 """Generate a series of lines, one for each line in `source`.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
76
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
77 Each line is a list of pairs, each pair is a token::
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
78
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
79 [('key', 'def'), ('ws', ' '), ('nam', 'hello'), ('op', '('), ... ]
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
80
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
81 Each pair has a token class, and the token text.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
82
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
83 If you concatenate all the token texts, and then join them with newlines,
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
84 you should have your original `source` back, with two differences:
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
85 trailing whitespace is not preserved, and a final line with no newline
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
86 is indistinguishable from a final line with a newline.
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
87
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
88 """
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
89
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
90 ws_tokens = set([token.INDENT, token.DEDENT, token.NEWLINE, tokenize.NL])
31
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
91 line = []
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
92 col = 0
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
93
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
94 # The \f is because of http://bugs.python.org/issue19035
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
95 source = source.expandtabs(8).replace('\r\n', '\n').replace('\f', ' ')
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
96 tokgen = generate_tokens(source)
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
97
31
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
98 for ttype, ttext, (_, scol), (_, ecol), _ in phys_tokens(tokgen):
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
99 mark_start = True
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
100 for part in re.split('(\n)', ttext):
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
101 if part == '\n':
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
102 yield line
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
103 line = []
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
104 col = 0
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
105 mark_end = False
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
106 elif part == '':
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
107 mark_end = False
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
108 elif ttype in ws_tokens:
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
109 mark_end = False
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
110 else:
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
111 if mark_start and scol > col:
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
112 line.append(("ws", u" " * (scol - col)))
31
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
113 mark_start = False
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
114 tok_class = tokenize.tok_name.get(ttype, 'xx').lower()[:3]
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
115 if ttype == token.NAME and keyword.iskeyword(ttext):
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
116 tok_class = "key"
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
117 line.append((tok_class, part))
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
118 mark_end = True
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
119 scol = 0
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
120 if mark_end:
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
121 col = ecol
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
122
744cd0b4b8cd Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
123 if line:
790
2c0ea0163ef4 Marked the Python2 debugger client files with the new eflag: marker.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 32
diff changeset
124 yield line
2c0ea0163ef4 Marked the Python2 debugger client files with the new eflag: marker.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 32
diff changeset
125
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
126
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
127 class CachedTokenizer(object):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
128 """A one-element cache around tokenize.generate_tokens.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
129
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
130 When reporting, coverage.py tokenizes files twice, once to find the
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
131 structure of the file, and once to syntax-color it. Tokenizing is
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
132 expensive, and easily cached.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
133
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
134 This is a one-element cache so that our twice-in-a-row tokenizing doesn't
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
135 actually tokenize twice.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
136
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
137 """
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
138 def __init__(self):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
139 self.last_text = None
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
140 self.last_tokens = None
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
141
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
142 @contract(text='unicode')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
143 def generate_tokens(self, text):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
144 """A stand-in for `tokenize.generate_tokens`."""
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
145 if text != self.last_text:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
146 self.last_text = text
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
147 readline = iternext(text.splitlines(True))
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
148 self.last_tokens = list(tokenize.generate_tokens(readline))
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
149 return self.last_tokens
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
150
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
151 # Create our generate_tokens cache as a callable replacement function.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
152 generate_tokens = CachedTokenizer().generate_tokens
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
153
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
154
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
155 COOKIE_RE = re.compile(r"^\s*#.*coding[:=]\s*([-\w.]+)", flags=re.MULTILINE)
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
156
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
157 @contract(source='bytes')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
158 def _source_encoding_py2(source):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
159 """Determine the encoding for `source`, according to PEP 263.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
160
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
161 `source` is a byte string, the text of the program.
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
162
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
163 Returns a string, the name of the encoding.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
164
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
165 """
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
166 assert isinstance(source, bytes)
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
167
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
168 # Do this so the detect_encode code we copied will work.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
169 readline = iternext(source.splitlines(True))
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
170
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
171 # This is mostly code adapted from Py3.2's tokenize module.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
172
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
173 def _get_normal_name(orig_enc):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
174 """Imitates get_normal_name in tokenizer.c."""
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
175 # Only care about the first 12 characters.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
176 enc = orig_enc[:12].lower().replace("_", "-")
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
177 if re.match(r"^utf-8($|-)", enc):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
178 return "utf-8"
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
179 if re.match(r"^(latin-1|iso-8859-1|iso-latin-1)($|-)", enc):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
180 return "iso-8859-1"
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
181 return orig_enc
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
182
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
183 # From detect_encode():
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
184 # It detects the encoding from the presence of a UTF-8 BOM or an encoding
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
185 # cookie as specified in PEP-0263. If both a BOM and a cookie are present,
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
186 # but disagree, a SyntaxError will be raised. If the encoding cookie is an
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
187 # invalid charset, raise a SyntaxError. Note that if a UTF-8 BOM is found,
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
188 # 'utf-8-sig' is returned.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
189
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
190 # If no encoding is specified, then the default will be returned.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
191 default = 'ascii'
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
192
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
193 bom_found = False
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
194 encoding = None
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
195
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
196 def read_or_stop():
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
197 """Get the next source line, or ''."""
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
198 try:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
199 return readline()
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
200 except StopIteration:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
201 return ''
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
202
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
203 def find_cookie(line):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
204 """Find an encoding cookie in `line`."""
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
205 try:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
206 line_string = line.decode('ascii')
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
207 except UnicodeDecodeError:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
208 return None
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
209
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
210 matches = COOKIE_RE.findall(line_string)
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
211 if not matches:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
212 return None
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
213 encoding = _get_normal_name(matches[0])
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
214 try:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
215 codec = codecs.lookup(encoding)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
216 except LookupError:
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
217 # This behavior mimics the Python interpreter
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
218 raise SyntaxError("unknown encoding: " + encoding)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
219
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
220 if bom_found:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
221 # codecs in 2.3 were raw tuples of functions, assume the best.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
222 codec_name = getattr(codec, 'name', encoding)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
223 if codec_name != 'utf-8':
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
224 # This behavior mimics the Python interpreter
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
225 raise SyntaxError('encoding problem: utf-8')
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
226 encoding += '-sig'
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
227 return encoding
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
228
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
229 first = read_or_stop()
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
230 if first.startswith(codecs.BOM_UTF8):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
231 bom_found = True
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
232 first = first[3:]
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
233 default = 'utf-8-sig'
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
234 if not first:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
235 return default
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
236
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
237 encoding = find_cookie(first)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
238 if encoding:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
239 return encoding
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
240
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
241 second = read_or_stop()
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
242 if not second:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
243 return default
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
244
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
245 encoding = find_cookie(second)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
246 if encoding:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
247 return encoding
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
248
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 790
diff changeset
249 return default
3499
f2d4b02c7e88 Modified the Python2 coverage files to include the Python2 eflags line and fixed an issue in both variants.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 3495
diff changeset
250
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
251
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
252 @contract(source='bytes')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
253 def _source_encoding_py3(source):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
254 """Determine the encoding for `source`, according to PEP 263.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
255
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
256 `source` is a byte string: the text of the program.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
257
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
258 Returns a string, the name of the encoding.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
259
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
260 """
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
261 readline = iternext(source.splitlines(True))
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
262 return tokenize.detect_encoding(readline)[0]
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
263
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
264
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
265 if env.PY3:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
266 source_encoding = _source_encoding_py3
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
267 else:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
268 source_encoding = _source_encoding_py2
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
269
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
270
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
271 @contract(source='unicode')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
272 def compile_unicode(source, filename, mode):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
273 """Just like the `compile` builtin, but works on any Unicode string.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
274
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
275 Python 2's compile() builtin has a stupid restriction: if the source string
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
276 is Unicode, then it may not have a encoding declaration in it. Why not?
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
277 Who knows!
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
278
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
279 This function catches that exception, neuters the coding declaration, and
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
280 compiles it anyway.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
281
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
282 """
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
283 try:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
284 code = compile(source, filename, mode)
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
285 except SyntaxError as synerr:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
286 if "coding declaration in unicode string" not in synerr.args[0].lower():
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
287 raise
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
288 source = neuter_encoding_declaration(source)
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
289 code = compile(source, filename, mode)
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
290
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
291 return code
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
292
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
293
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
294 @contract(source='unicode', returns='unicode')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
295 def neuter_encoding_declaration(source):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
296 """Return `source`, with any encoding declaration neutered.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
297
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
298 This function will only ever be called on `source` that has an encoding
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
299 declaration, so some edge cases can be ignored.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
300
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
301 """
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
302 source = COOKIE_RE.sub("# (deleted declaration)", source)
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3499
diff changeset
303 return source
4491
0d8612e24fef Merged with coverage.py update.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
304
0d8612e24fef Merged with coverage.py update.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
305 #
0d8612e24fef Merged with coverage.py update.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
306 # eflag: FileType = Python2

eric ide

mercurial