ThirdParty/CharDet/chardet/chardistribution.py

Mon, 28 Dec 2009 16:03:33 +0000

author
Detlev Offenbach <detlev@die-offenbachs.de>
date
Mon, 28 Dec 2009 16:03:33 +0000
changeset 0
de9c2efb9d02
child 12
1d8dd9706f46
permissions
-rw-r--r--

Started porting eric4 to Python3

0
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
1 ######################## BEGIN LICENSE BLOCK ########################
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
2 # The Original Code is Mozilla Communicator client code.
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
3 #
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
4 # The Initial Developer of the Original Code is
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
5 # Netscape Communications Corporation.
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
6 # Portions created by the Initial Developer are Copyright (C) 1998
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
7 # the Initial Developer. All Rights Reserved.
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
8 #
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
9 # Contributor(s):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
10 # Mark Pilgrim - port to Python
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
11 #
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
12 # This library is free software; you can redistribute it and/or
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
13 # modify it under the terms of the GNU Lesser General Public
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
14 # License as published by the Free Software Foundation; either
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
15 # version 2.1 of the License, or (at your option) any later version.
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
16 #
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
17 # This library is distributed in the hope that it will be useful,
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
18 # but WITHOUT ANY WARRANTY; without even the implied warranty of
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
19 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
20 # Lesser General Public License for more details.
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
21 #
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
22 # You should have received a copy of the GNU Lesser General Public
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
23 # License along with this library; if not, write to the Free Software
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
24 # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
25 # 02110-1301 USA
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
26 ######################### END LICENSE BLOCK #########################
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
27
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
28 import constants
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
29 from euctwfreq import EUCTWCharToFreqOrder, EUCTW_TABLE_SIZE, EUCTW_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
30 from euckrfreq import EUCKRCharToFreqOrder, EUCKR_TABLE_SIZE, EUCKR_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
31 from gb2312freq import GB2312CharToFreqOrder, GB2312_TABLE_SIZE, GB2312_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
32 from big5freq import Big5CharToFreqOrder, BIG5_TABLE_SIZE, BIG5_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
33 from jisfreq import JISCharToFreqOrder, JIS_TABLE_SIZE, JIS_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
34
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
35 ENOUGH_DATA_THRESHOLD = 1024
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
36 SURE_YES = 0.99
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
37 SURE_NO = 0.01
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
38
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
39 class CharDistributionAnalysis:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
40 def __init__(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
41 self._mCharToFreqOrder = None # Mapping table to get frequency order from char order (get from GetOrder())
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
42 self._mTableSize = None # Size of above table
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
43 self._mTypicalDistributionRatio = None # This is a constant value which varies from language to language, used in calculating confidence. See http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html for further detail.
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
44 self.reset()
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
45
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
46 def reset(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
47 """reset analyser, clear any state"""
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
48 self._mDone = constants.False # If this flag is set to constants.True, detection is done and conclusion has been made
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
49 self._mTotalChars = 0 # Total characters encountered
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
50 self._mFreqChars = 0 # The number of characters whose frequency order is less than 512
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
51
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
52 def feed(self, aStr, aCharLen):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
53 """feed a character with known length"""
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
54 if aCharLen == 2:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
55 # we only care about 2-bytes character in our distribution analysis
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
56 order = self.get_order(aStr)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
57 else:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
58 order = -1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
59 if order >= 0:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
60 self._mTotalChars += 1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
61 # order is valid
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
62 if order < self._mTableSize:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
63 if 512 > self._mCharToFreqOrder[order]:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
64 self._mFreqChars += 1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
65
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
66 def get_confidence(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
67 """return confidence based on existing data"""
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
68 # if we didn't receive any character in our consideration range, return negative answer
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
69 if self._mTotalChars <= 0:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
70 return SURE_NO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
71
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
72 if self._mTotalChars != self._mFreqChars:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
73 r = self._mFreqChars / ((self._mTotalChars - self._mFreqChars) * self._mTypicalDistributionRatio)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
74 if r < SURE_YES:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
75 return r
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
76
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
77 # normalize confidence (we don't want to be 100% sure)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
78 return SURE_YES
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
79
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
80 def got_enough_data(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
81 # It is not necessary to receive all data to draw conclusion. For charset detection,
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
82 # certain amount of data is enough
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
83 return self._mTotalChars > ENOUGH_DATA_THRESHOLD
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
84
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
85 def get_order(self, aStr):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
86 # We do not handle characters based on the original encoding string, but
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
87 # convert this encoding string to a number, here called order.
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
88 # This allows multiple encodings of a language to share one frequency table.
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
89 return -1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
90
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
91 class EUCTWDistributionAnalysis(CharDistributionAnalysis):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
92 def __init__(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
93 CharDistributionAnalysis.__init__(self)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
94 self._mCharToFreqOrder = EUCTWCharToFreqOrder
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
95 self._mTableSize = EUCTW_TABLE_SIZE
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
96 self._mTypicalDistributionRatio = EUCTW_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
97
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
98 def get_order(self, aStr):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
99 # for euc-TW encoding, we are interested
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
100 # first byte range: 0xc4 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
101 # second byte range: 0xa1 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
102 # no validation needed here. State machine has done that
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
103 if aStr[0] >= '\xC4':
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
104 return 94 * (ord(aStr[0]) - 0xC4) + ord(aStr[1]) - 0xA1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
105 else:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
106 return -1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
107
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
108 class EUCKRDistributionAnalysis(CharDistributionAnalysis):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
109 def __init__(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
110 CharDistributionAnalysis.__init__(self)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
111 self._mCharToFreqOrder = EUCKRCharToFreqOrder
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
112 self._mTableSize = EUCKR_TABLE_SIZE
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
113 self._mTypicalDistributionRatio = EUCKR_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
114
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
115 def get_order(self, aStr):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
116 # for euc-KR encoding, we are interested
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
117 # first byte range: 0xb0 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
118 # second byte range: 0xa1 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
119 # no validation needed here. State machine has done that
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
120 if aStr[0] >= '\xB0':
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
121 return 94 * (ord(aStr[0]) - 0xB0) + ord(aStr[1]) - 0xA1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
122 else:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
123 return -1;
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
124
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
125 class GB2312DistributionAnalysis(CharDistributionAnalysis):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
126 def __init__(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
127 CharDistributionAnalysis.__init__(self)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
128 self._mCharToFreqOrder = GB2312CharToFreqOrder
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
129 self._mTableSize = GB2312_TABLE_SIZE
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
130 self._mTypicalDistributionRatio = GB2312_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
131
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
132 def get_order(self, aStr):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
133 # for GB2312 encoding, we are interested
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
134 # first byte range: 0xb0 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
135 # second byte range: 0xa1 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
136 # no validation needed here. State machine has done that
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
137 if (aStr[0] >= '\xB0') and (aStr[1] >= '\xA1'):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
138 return 94 * (ord(aStr[0]) - 0xB0) + ord(aStr[1]) - 0xA1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
139 else:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
140 return -1;
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
141
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
142 class Big5DistributionAnalysis(CharDistributionAnalysis):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
143 def __init__(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
144 CharDistributionAnalysis.__init__(self)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
145 self._mCharToFreqOrder = Big5CharToFreqOrder
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
146 self._mTableSize = BIG5_TABLE_SIZE
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
147 self._mTypicalDistributionRatio = BIG5_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
148
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
149 def get_order(self, aStr):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
150 # for big5 encoding, we are interested
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
151 # first byte range: 0xa4 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
152 # second byte range: 0x40 -- 0x7e , 0xa1 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
153 # no validation needed here. State machine has done that
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
154 if aStr[0] >= '\xA4':
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
155 if aStr[1] >= '\xA1':
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
156 return 157 * (ord(aStr[0]) - 0xA4) + ord(aStr[1]) - 0xA1 + 63
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
157 else:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
158 return 157 * (ord(aStr[0]) - 0xA4) + ord(aStr[1]) - 0x40
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
159 else:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
160 return -1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
161
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
162 class SJISDistributionAnalysis(CharDistributionAnalysis):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
163 def __init__(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
164 CharDistributionAnalysis.__init__(self)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
165 self._mCharToFreqOrder = JISCharToFreqOrder
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
166 self._mTableSize = JIS_TABLE_SIZE
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
167 self._mTypicalDistributionRatio = JIS_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
168
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
169 def get_order(self, aStr):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
170 # for sjis encoding, we are interested
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
171 # first byte range: 0x81 -- 0x9f , 0xe0 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
172 # second byte range: 0x40 -- 0x7e, 0x81 -- oxfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
173 # no validation needed here. State machine has done that
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
174 if (aStr[0] >= '\x81') and (aStr[0] <= '\x9F'):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
175 order = 188 * (ord(aStr[0]) - 0x81)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
176 elif (aStr[0] >= '\xE0') and (aStr[0] <= '\xEF'):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
177 order = 188 * (ord(aStr[0]) - 0xE0 + 31)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
178 else:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
179 return -1;
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
180 order = order + ord(aStr[1]) - 0x40
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
181 if aStr[1] > '\x7F':
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
182 order =- 1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
183 return order
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
184
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
185 class EUCJPDistributionAnalysis(CharDistributionAnalysis):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
186 def __init__(self):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
187 CharDistributionAnalysis.__init__(self)
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
188 self._mCharToFreqOrder = JISCharToFreqOrder
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
189 self._mTableSize = JIS_TABLE_SIZE
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
190 self._mTypicalDistributionRatio = JIS_TYPICAL_DISTRIBUTION_RATIO
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
191
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
192 def get_order(self, aStr):
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
193 # for euc-JP encoding, we are interested
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
194 # first byte range: 0xa0 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
195 # second byte range: 0xa1 -- 0xfe
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
196 # no validation needed here. State machine has done that
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
197 if aStr[0] >= '\xA0':
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
198 return 94 * (ord(aStr[0]) - 0xA1) + ord(aStr[1]) - 0xa1
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
199 else:
de9c2efb9d02 Started porting eric4 to Python3
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
200 return -1

eric ide

mercurial