eric6/DebugClients/Python/coverage/phystokens.py

Sat, 27 Apr 2019 22:06:38 +0200

author
T.Rzepka <Tobias.Rzepka@gmail.com>
date
Sat, 27 Apr 2019 22:06:38 +0200
branch
Variables Viewer
changeset 6978
720247f98e1f
parent 6942
2602857055c5
child 7427
362cd1b6f81a
permissions
-rw-r--r--

Improved determination of expandable items including removing 'other' as selectable type.
Variables of unknown type are now count as instances.

4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
1 # Licensed under the Apache License: http://www.apache.org/licenses/LICENSE-2.0
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
2 # For details: https://bitbucket.org/ned/coveragepy/src/default/NOTICE.txt
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
3
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
4 """Better tokenizing for coverage.py."""
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
5
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
6 import codecs
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
7 import keyword
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
8 import re
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
9 import sys
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
10 import token
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
11 import tokenize
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
12
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
13 from coverage import env
6219
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
14 from coverage.backward import iternext, unicode_class
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
15 from coverage.misc import contract
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
16
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
17
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
18 def phys_tokens(toks):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
19 """Return all physical tokens, even line continuations.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
20
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
21 tokenize.generate_tokens() doesn't return a token for the backslash that
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
22 continues lines. This wrapper provides those tokens so that we can
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
23 re-create a faithful representation of the original source.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
24
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
25 Returns the same values as generate_tokens()
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
26
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
27 """
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
28 last_line = None
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
29 last_lineno = -1
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
30 last_ttype = None
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
31 for ttype, ttext, (slineno, scol), (elineno, ecol), ltext in toks:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
32 if last_lineno != elineno:
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
33 if last_line and last_line.endswith("\\\n"):
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
34 # We are at the beginning of a new line, and the last line
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
35 # ended with a backslash. We probably have to inject a
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
36 # backslash token into the stream. Unfortunately, there's more
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
37 # to figure out. This code::
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
38 #
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
39 # usage = """\
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
40 # HEY THERE
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
41 # """
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
42 #
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
43 # triggers this condition, but the token text is::
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
44 #
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
45 # '"""\\\nHEY THERE\n"""'
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
46 #
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
47 # so we need to figure out if the backslash is already in the
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
48 # string token or not.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
49 inject_backslash = True
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
50 if last_ttype == tokenize.COMMENT:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
51 # Comments like this \
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
52 # should never result in a new token.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
53 inject_backslash = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
54 elif ttype == token.STRING:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
55 if "\n" in ttext and ttext.split('\n', 1)[0][-1] == '\\':
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
56 # It's a multi-line string and the first line ends with
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
57 # a backslash, so we don't need to inject another.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
58 inject_backslash = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
59 if inject_backslash:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
60 # Figure out what column the backslash is in.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
61 ccol = len(last_line.split("\n")[-2]) - 1
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
62 # Yield the token, with a fake token type.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
63 yield (
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
64 99999, "\\\n",
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
65 (slineno, ccol), (slineno, ccol+2),
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
66 last_line
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
67 )
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
68 last_line = ltext
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
69 last_ttype = ttype
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
70 yield ttype, ttext, (slineno, scol), (elineno, ecol), ltext
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
71 last_lineno = elineno
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
72
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
73
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
74 @contract(source='unicode')
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
75 def source_token_lines(source):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
76 """Generate a series of lines, one for each line in `source`.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
77
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
78 Each line is a list of pairs, each pair is a token::
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
79
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
80 [('key', 'def'), ('ws', ' '), ('nam', 'hello'), ('op', '('), ... ]
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
81
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
82 Each pair has a token class, and the token text.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
83
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
84 If you concatenate all the token texts, and then join them with newlines,
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
85 you should have your original `source` back, with two differences:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
86 trailing whitespace is not preserved, and a final line with no newline
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
87 is indistinguishable from a final line with a newline.
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
88
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
89 """
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
90
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
91 ws_tokens = set([token.INDENT, token.DEDENT, token.NEWLINE, tokenize.NL])
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
92 line = []
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
93 col = 0
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
94
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
95 source = source.expandtabs(8).replace('\r\n', '\n')
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
96 tokgen = generate_tokens(source)
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
97
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
98 for ttype, ttext, (_, scol), (_, ecol), _ in phys_tokens(tokgen):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
99 mark_start = True
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
100 for part in re.split('(\n)', ttext):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
101 if part == '\n':
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
102 yield line
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
103 line = []
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
104 col = 0
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
105 mark_end = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
106 elif part == '':
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
107 mark_end = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
108 elif ttype in ws_tokens:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
109 mark_end = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
110 else:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
111 if mark_start and scol > col:
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
112 line.append(("ws", u" " * (scol - col)))
29
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
113 mark_start = False
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
114 tok_class = tokenize.tok_name.get(ttype, 'xx').lower()[:3]
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
115 if ttype == token.NAME and keyword.iskeyword(ttext):
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
116 tok_class = "key"
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
117 line.append((tok_class, part))
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
118 mark_end = True
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
119 scol = 0
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
120 if mark_end:
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
121 col = ecol
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
122
391dc0bc4ae5 Updated coverage.py to version 3.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
123 if line:
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
124 yield line
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
125
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
126
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
127 class CachedTokenizer(object):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
128 """A one-element cache around tokenize.generate_tokens.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
129
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
130 When reporting, coverage.py tokenizes files twice, once to find the
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
131 structure of the file, and once to syntax-color it. Tokenizing is
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
132 expensive, and easily cached.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
133
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
134 This is a one-element cache so that our twice-in-a-row tokenizing doesn't
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
135 actually tokenize twice.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
136
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
137 """
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
138 def __init__(self):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
139 self.last_text = None
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
140 self.last_tokens = None
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
141
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
142 @contract(text='unicode')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
143 def generate_tokens(self, text):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
144 """A stand-in for `tokenize.generate_tokens`."""
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
145 if text != self.last_text:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
146 self.last_text = text
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
147 readline = iternext(text.splitlines(True))
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
148 self.last_tokens = list(tokenize.generate_tokens(readline))
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
149 return self.last_tokens
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
150
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
151 # Create our generate_tokens cache as a callable replacement function.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
152 generate_tokens = CachedTokenizer().generate_tokens
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
153
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
154
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
155 COOKIE_RE = re.compile(r"^[ \t]*#.*coding[:=][ \t]*([-\w.]+)", flags=re.MULTILINE)
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
156
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
157 @contract(source='bytes')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
158 def _source_encoding_py2(source):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
159 """Determine the encoding for `source`, according to PEP 263.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
160
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
161 `source` is a byte string, the text of the program.
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
162
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
163 Returns a string, the name of the encoding.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
164
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
165 """
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
166 assert isinstance(source, bytes)
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
167
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
168 # Do this so the detect_encode code we copied will work.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
169 readline = iternext(source.splitlines(True))
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
170
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
171 # This is mostly code adapted from Py3.2's tokenize module.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
172
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
173 def _get_normal_name(orig_enc):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
174 """Imitates get_normal_name in tokenizer.c."""
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
175 # Only care about the first 12 characters.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
176 enc = orig_enc[:12].lower().replace("_", "-")
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
177 if re.match(r"^utf-8($|-)", enc):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
178 return "utf-8"
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
179 if re.match(r"^(latin-1|iso-8859-1|iso-latin-1)($|-)", enc):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
180 return "iso-8859-1"
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
181 return orig_enc
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
182
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
183 # From detect_encode():
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
184 # It detects the encoding from the presence of a UTF-8 BOM or an encoding
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
185 # cookie as specified in PEP-0263. If both a BOM and a cookie are present,
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
186 # but disagree, a SyntaxError will be raised. If the encoding cookie is an
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
187 # invalid charset, raise a SyntaxError. Note that if a UTF-8 BOM is found,
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
188 # 'utf-8-sig' is returned.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
189
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
190 # If no encoding is specified, then the default will be returned.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
191 default = 'ascii'
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
192
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
193 bom_found = False
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
194 encoding = None
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
195
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
196 def read_or_stop():
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
197 """Get the next source line, or ''."""
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
198 try:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
199 return readline()
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
200 except StopIteration:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
201 return ''
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
202
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
203 def find_cookie(line):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
204 """Find an encoding cookie in `line`."""
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
205 try:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
206 line_string = line.decode('ascii')
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
207 except UnicodeDecodeError:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
208 return None
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
209
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
210 matches = COOKIE_RE.findall(line_string)
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
211 if not matches:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
212 return None
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
213 encoding = _get_normal_name(matches[0])
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
214 try:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
215 codec = codecs.lookup(encoding)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
216 except LookupError:
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
217 # This behavior mimics the Python interpreter
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
218 raise SyntaxError("unknown encoding: " + encoding)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
219
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
220 if bom_found:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
221 # codecs in 2.3 were raw tuples of functions, assume the best.
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
222 codec_name = getattr(codec, 'name', encoding)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
223 if codec_name != 'utf-8':
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
224 # This behavior mimics the Python interpreter
3495
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
225 raise SyntaxError('encoding problem: utf-8')
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
226 encoding += '-sig'
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
227 return encoding
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
228
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
229 first = read_or_stop()
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
230 if first.startswith(codecs.BOM_UTF8):
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
231 bom_found = True
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
232 first = first[3:]
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
233 default = 'utf-8-sig'
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
234 if not first:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
235 return default
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
236
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
237 encoding = find_cookie(first)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
238 if encoding:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
239 return encoding
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
240
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
241 second = read_or_stop()
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
242 if not second:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
243 return default
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
244
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
245 encoding = find_cookie(second)
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
246 if encoding:
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
247 return encoding
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
248
fac17a82b431 updated coverage to 3.7.1
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 29
diff changeset
249 return default
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
250
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
251
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
252 @contract(source='bytes')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
253 def _source_encoding_py3(source):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
254 """Determine the encoding for `source`, according to PEP 263.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
255
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
256 `source` is a byte string: the text of the program.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
257
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
258 Returns a string, the name of the encoding.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
259
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
260 """
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
261 readline = iternext(source.splitlines(True))
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
262 return tokenize.detect_encoding(readline)[0]
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
263
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
264
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
265 if env.PY3:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
266 source_encoding = _source_encoding_py3
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
267 else:
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
268 source_encoding = _source_encoding_py2
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
269
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
270
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
271 @contract(source='unicode')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
272 def compile_unicode(source, filename, mode):
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
273 """Just like the `compile` builtin, but works on any Unicode string.
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
274
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
275 Python 2's compile() builtin has a stupid restriction: if the source string
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
276 is Unicode, then it may not have a encoding declaration in it. Why not?
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
277 Who knows! It also decodes to utf8, and then tries to interpret those utf8
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
278 bytes according to the encoding declaration. Why? Who knows!
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
279
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
280 This function neuters the coding declaration, and compiles it.
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
281
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
282 """
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
283 source = neuter_encoding_declaration(source)
6219
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
284 if env.PY2 and isinstance(filename, unicode_class):
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
285 filename = filename.encode(sys.getfilesystemencoding(), "replace")
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
286 code = compile(source, filename, mode)
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
287 return code
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
288
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
289
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
290 @contract(source='unicode', returns='unicode')
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
291 def neuter_encoding_declaration(source):
5051
3586ebd9fac8 Updated coverage.py to version 4.1.0.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 4489
diff changeset
292 """Return `source`, with any encoding declaration neutered."""
6219
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
293 if COOKIE_RE.search(source):
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
294 source_lines = source.splitlines(True)
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
295 for lineno in range(min(2, len(source_lines))):
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
296 source_lines[lineno] = COOKIE_RE.sub("# (deleted declaration)", source_lines[lineno])
d6c795b5ce33 Third Party, coverage: updated coverage.py to 4.5.1.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5178
diff changeset
297 source = "".join(source_lines)
4489
d0d6e4ad31bd Updated coverage to 4.0 (breaks with Python 3.2 support).
T.Rzepka <Tobias.Rzepka@gmail.com>
parents: 3495
diff changeset
298 return source

eric ide

mercurial