Wed, 13 Jul 2022 15:34:50 +0200
Revisions <no_multi_processing, Variables Viewer, with_python2> closed.
5714
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
1 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
2 | All of the Enums that are used throughout the chardet package. |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
3 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
4 | :author: Dan Blanchard (dan.blanchard@gmail.com) |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
5 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
6 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
7 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
8 | class InputState(object): |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
9 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
10 | This enum represents the different states a universal detector can be in. |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
11 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
12 | PURE_ASCII = 0 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
13 | ESC_ASCII = 1 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
14 | HIGH_BYTE = 2 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
15 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
16 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
17 | class LanguageFilter(object): |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
18 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
19 | This enum represents the different language filters we can apply to a |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
20 | ``UniversalDetector``. |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
21 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
22 | CHINESE_SIMPLIFIED = 0x01 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
23 | CHINESE_TRADITIONAL = 0x02 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
24 | JAPANESE = 0x04 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
25 | KOREAN = 0x08 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
26 | NON_CJK = 0x10 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
27 | ALL = 0x1F |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
28 | CHINESE = CHINESE_SIMPLIFIED | CHINESE_TRADITIONAL |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
29 | CJK = CHINESE | JAPANESE | KOREAN |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
30 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
31 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
32 | class ProbingState(object): |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
33 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
34 | This enum represents the different states a prober can be in. |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
35 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
36 | DETECTING = 0 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
37 | FOUND_IT = 1 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
38 | NOT_ME = 2 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
39 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
40 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
41 | class MachineState(object): |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
42 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
43 | This enum represents the different states a state machine can be in. |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
44 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
45 | START = 0 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
46 | ERROR = 1 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
47 | ITS_ME = 2 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
48 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
49 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
50 | class SequenceLikelihood(object): |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
51 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
52 | This enum represents the likelihood of a character following the previous one. |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
53 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
54 | NEGATIVE = 0 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
55 | UNLIKELY = 1 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
56 | LIKELY = 2 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
57 | POSITIVE = 3 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
58 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
59 | @classmethod |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
60 | def get_num_categories(cls): |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
61 | """:returns: The number of likelihood categories in the enum.""" |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
62 | return 4 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
63 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
64 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
65 | class CharacterCategory(object): |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
66 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
67 | This enum represents the different categories language models for |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
68 | ``SingleByteCharsetProber`` put characters into. |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
69 | |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
70 | Anything less than CONTROL is considered a letter. |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
71 | """ |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
72 | UNDEFINED = 255 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
73 | LINE_BREAK = 254 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
74 | SYMBOL = 253 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
75 | DIGIT = 252 |
90c57b50600f
Updated chardet to 3.0.2.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
76 | CONTROL = 251 |