eric6/E5Network/E5TldExtractor.py

Sat, 26 Sep 2020 10:58:18 +0200

author
Detlev Offenbach <detlev@die-offenbachs.de>
date
Sat, 26 Sep 2020 10:58:18 +0200
changeset 7717
f32d7965a17e
parent 7716
313e09453306
child 7775
4a1db75550bd
permissions
-rw-r--r--

Changed the code to not rely on the Qt Resource system anymore (no .qrc files and no use of pyrcc5 anymore).

4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
1 # -*- coding: utf-8 -*-
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
2
7360
9190402e4505 Updated copyright for 2020.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7253
diff changeset
3 # Copyright (c) 2016 - 2020 Detlev Offenbach <detlev@die-offenbachs.de>
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
4 #
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
5
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
6 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
7 Module implementing the TLD Extractor.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
8 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
9
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
10 #
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
11 # This is a Python port of the TLDExtractor of Qupzilla
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
12 # Copyright (C) 2014 Razi Alavizadeh <s.r.alavizadeh@gmail.com>
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
13 #
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
14
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
15 import collections
7717
f32d7965a17e Changed the code to not rely on the Qt Resource system anymore (no .qrc files and no use of pyrcc5 anymore).
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7716
diff changeset
16 import os
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
17
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
18 from PyQt5.QtCore import QObject, QUrl, QFile, QFileInfo, QRegExp, qWarning
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
19
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
20 from E5Gui import E5MessageBox
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
21
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
22
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
23 class E5TldHostParts(object):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
24 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
25 Class implementing the host parts helper.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
26 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
27 def __init__(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
28 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
29 Constructor
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
30 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
31 self.host = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
32 self.tld = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
33 self.domain = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
34 self.registrableDomain = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
35 self.subdomain = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
36
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
37
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
38 class E5TldExtractor(QObject):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
39 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
40 Class implementing the TLD Extractor.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
41
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
42 Note: The module function instance() should be used to get a reference
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
43 to a global object to avoid overhead.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
44 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
45 def __init__(self, withPrivate=False, parent=None):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
46 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
47 Constructor
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
48
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
49 @param withPrivate flag indicating to load private TLDs as well
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
50 @type bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
51 @param parent reference to the parent object
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
52 @type QObject
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
53 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
54 super(E5TldExtractor, self).__init__(parent)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
55
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
56 self.__withPrivate = withPrivate
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
57 self.__dataFileName = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
58 self.__dataSearchPaths = []
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
59
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
60 self.__tldDict = collections.defaultdict(list)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
61 # dict with list of str as values
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
62
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
63 self.setDataSearchPaths()
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
64
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
65 def isDataLoaded(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
66 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
67 Public method to check, if the TLD data ia already loaded.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
68
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
69 @return flag indicating data is loaded
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
70 @rtype bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
71 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
72 return bool(self.__tldDict)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
73
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
74 def tld(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
75 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
76 Public method to get the top level domain for a host.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
77
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
78 @param host host name to get TLD for
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
79 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
80 @return TLD for host
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
81 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
82 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
83 if not host or host.startswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
84 return ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
85
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
86 cleanHost = self.__normalizedHost(host)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
87
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
88 tldPart = cleanHost[cleanHost.rfind(".") + 1:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
89 cleanHost = bytes(QUrl.toAce(cleanHost)).decode("utf-8")
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
90
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
91 self.__loadData()
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
92
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
93 if tldPart not in self.__tldDict:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
94 return tldPart
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
95
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
96 tldRules = self.__tldDict[tldPart][:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
97
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
98 if tldPart not in tldRules:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
99 tldRules.append(tldPart)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
100
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
101 maxLabelCount = 0
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
102 isWildcardTLD = False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
103
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
104 for rule in tldRules:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
105 labelCount = rule.count(".") + 1
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
106
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
107 if rule.startswith("!"):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
108 rule = rule[1:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
109
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
110 rule = bytes(QUrl.toAce(rule)).decode("utf-8")
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
111
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
112 # matches with exception TLD
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
113 if cleanHost.endswith(rule):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
114 tldPart = rule[rule.find(".") + 1:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
115 break
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
116
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
117 if rule.startswith("*"):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
118 rule = rule[1:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
119
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
120 if rule.startswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
121 rule = rule[1:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
122
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
123 isWildcardTLD = True
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
124 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
125 isWildcardTLD = False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
126
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
127 rule = bytes(QUrl.toAce(rule)).decode("utf-8")
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
128 testRule = "." + rule
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
129 testUrl = "." + cleanHost
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
130
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
131 if labelCount > maxLabelCount and testUrl.endswith(testRule):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
132 tldPart = rule
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
133 maxLabelCount = labelCount
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
134
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
135 if isWildcardTLD:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
136 temp = cleanHost
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
137 temp = temp[:temp.rfind(tldPart)]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
138
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
139 if temp.endswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
140 temp = temp[:-1]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
141
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
142 temp = temp[temp.rfind(".") + 1:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
143
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
144 if temp:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
145 tldPart = temp + "." + rule
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
146 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
147 tldPart = rule
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
148
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
149 temp = self.__normalizedHost(host)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
150 tldPart = ".".join(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
151 temp.split(".")[temp.count(".") - tldPart.count("."):])
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
152
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
153 return tldPart
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
154
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
155 def domain(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
156 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
157 Public method to get the domain for a host.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
158
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
159 @param host host name to get the domain for
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
160 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
161 @return domain for host
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
162 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
163 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
164 tldPart = self.tld(host)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
165
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
166 return self.__domainHelper(host, tldPart)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
167
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
168 def registrableDomain(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
169 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
170 Public method to get the registrable domain for a host.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
171
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
172 @param host host name to get the registrable domain for
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
173 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
174 @return registrable domain for host
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
175 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
176 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
177 tldPart = self.tld(host)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
178
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
179 return self.__registrableDomainHelper(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
180 self.__domainHelper(host, tldPart), tldPart)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
181
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
182 def subdomain(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
183 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
184 Public method to get the subdomain for a host.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
185
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
186 @param host host name to get the subdomain for
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
187 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
188 @return subdomain for host
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
189 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
190 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
191 return self.__subdomainHelper(host, self.registrableDomain(host))
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
192
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
193 def splitParts(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
194 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
195 Public method to split a host address into its parts.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
196
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
197 @param host host address to be split
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
198 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
199 @return splitted host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
200 @rtype E5TldHostParts
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
201 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
202 hostParts = E5TldHostParts()
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
203 hostParts.host = host
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
204 hostParts.tld = self.tld(host)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
205 hostParts.domain = self.__domainHelper(host, hostParts.tld)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
206 hostParts.registrableDomain = self.__registrableDomainHelper(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
207 hostParts.domain, hostParts.tld)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
208 hostParts.subdomain = self.__subdomainHelper(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
209 host, hostParts.registrableDomain)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
210
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
211 return hostParts
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
212
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
213 def dataSearchPaths(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
214 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
215 Public method to get the search paths for the TLD data file.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
216
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
217 @return search paths for the TLD data file
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
218 @rtype list of str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
219 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
220 return self.__dataSearchPaths[:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
221
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
222 def setDataSearchPaths(self, searchPaths=None):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
223 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
224 Public method to set the search paths for the TLD data file.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
225
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
226 @param searchPaths search paths for the TLD data file or None,
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
227 if the default search paths shall be set
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
228 @type list of str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
229 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
230 if searchPaths:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
231 self.__dataSearchPaths = searchPaths[:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
232 self.__dataSearchPaths.extend(self.__defaultDataSearchPaths())
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
233 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
234 self.__dataSearchPaths = self.__defaultDataSearchPaths()[:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
235
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
236 # remove duplicates
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
237 paths = []
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
238 for p in self.__dataSearchPaths:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
239 if p not in paths:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
240 paths.append(p)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
241 self.__dataSearchPaths = paths
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
242
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
243 def __defaultDataSearchPaths(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
244 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
245 Private method to get the default search paths for the TLD data file.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
246
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
247 @return default search paths for the TLD data file
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
248 @rtype list of str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
249 """
7717
f32d7965a17e Changed the code to not rely on the Qt Resource system anymore (no .qrc files and no use of pyrcc5 anymore).
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7716
diff changeset
250 return [os.path.join(os.path.dirname(__file__), "data")]
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
251
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
252 def getTldDownloadUrl(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
253 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
254 Public method to get the TLD data file download URL.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
255
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
256 @return download URL
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
257 @rtype QUrl
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
258 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
259 return QUrl(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
260 "http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/"
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
261 "effective_tld_names.dat?raw=1")
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
262
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
263 def __loadData(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
264 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
265 Private method to load the TLD data.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
266 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
267 if self.isDataLoaded():
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
268 return
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
269
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
270 dataFileName = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
271 parsedDataFileExist = False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
272
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
273 for path in self.__dataSearchPaths:
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
274 dataFileName = (
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
275 QFileInfo(path + "/effective_tld_names.dat").absoluteFilePath()
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
276 )
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
277 if QFileInfo(dataFileName).exists():
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
278 parsedDataFileExist = True
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
279 break
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
280
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
281 if not parsedDataFileExist:
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
282 tldDataFileDownloadLink = (
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
283 "http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/"
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
284 "effective_tld_names.dat?raw=1"
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
285 )
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
286 E5MessageBox.information(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
287 None,
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
288 self.tr("TLD Data File not found"),
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
289 self.tr("""<p>The file 'effective_tld_names.dat' was not"""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
290 """ found!<br/>You can download it from """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
291 """'<a href="{0}"><b>here</b></a>' to one of the"""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
292 """ following paths:</p><ul>{1}</ul>""").format(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
293 tldDataFileDownloadLink,
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
294 "".join(["<li>{0}</li>".format(p)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
295 for p in self.__dataSearchPaths]))
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
296 )
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
297 return
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
298
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
299 self.__dataFileName = dataFileName
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
300 if not self.__parseData(dataFileName,
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
301 loadPrivateDomains=self.__withPrivate):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
302 qWarning(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
303 "E5TldExtractor: There are some parse errors for file: {0}"
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
304 .format(dataFileName))
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
305
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
306 def __parseData(self, dataFile, loadPrivateDomains=False):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
307 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
308 Private method to parse TLD data.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
309
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
310 @param dataFile name of the file containing the TLD data
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
311 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
312 @param loadPrivateDomains flag indicating to load private domains
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
313 @type bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
314 @return flag indicating success
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
315 @rtype bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
316 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
317 # start with a fresh dictionary
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
318 self.__tldDict = collections.defaultdict(list)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
319
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
320 file = QFile(dataFile)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
321
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
322 if not file.open(QFile.ReadOnly | QFile.Text):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
323 return False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
324
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
325 seekToEndOfPrivateDomains = False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
326
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
327 while not file.atEnd():
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
328 line = bytes(file.readLine()).decode("utf-8").strip()
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
329 if not line:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
330 continue
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
331
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
332 if line.startswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
333 line = line[1:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
334
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
335 if line.startswith("//"):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
336 if "===END PRIVATE DOMAINS===" in line:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
337 seekToEndOfPrivateDomains = False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
338
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
339 if (
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
340 not loadPrivateDomains and
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
341 "===BEGIN PRIVATE DOMAINS===" in line
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
342 ):
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
343 seekToEndOfPrivateDomains = True
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
344
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
345 continue
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
346
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
347 if seekToEndOfPrivateDomains:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
348 continue
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
349
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
350 # only data up to the first whitespace is used
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
351 line = line.split(None, 1)[0]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
352
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
353 if "." not in line:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
354 self.__tldDict[line].append(line)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
355 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
356 key = line[line.rfind(".") + 1:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
357 self.__tldDict[key].append(line)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
358
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
359 return self.isDataLoaded()
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
360
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
361 def __domainHelper(self, host, tldPart):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
362 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
363 Private method to get the domain name without TLD.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
364
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
365 @param host host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
366 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
367 @param tldPart TLD part of the host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
368 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
369 @return domain name
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
370 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
371 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
372 if not host or not tldPart:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
373 return ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
374
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
375 temp = self.__normalizedHost(host)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
376 temp = temp[:temp.rfind(tldPart)]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
377
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
378 if temp.endswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
379 temp = temp[:-1]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
380
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
381 return temp[temp.rfind(".") + 1:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
382
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
383 def __registrableDomainHelper(self, domainPart, tldPart):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
384 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
385 Private method to get the registrable domain (i.e. domain plus TLD).
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
386
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
387 @param domainPart domain part of a host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
388 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
389 @param tldPart TLD part of a host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
390 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
391 @return registrable domain name
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
392 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
393 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
394 if not tldPart or not domainPart:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
395 return ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
396 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
397 return "{0}.{1}".format(domainPart, tldPart)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
398
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
399 def __subdomainHelper(self, host, registrablePart):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
400 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
401 Private method to get the subdomain of a host address (i.e. domain part
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
402 without the registrable domain name).
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
403
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
404 @param host host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
405 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
406 @param registrablePart registrable domain part of the host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
407 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
408 @return subdomain name
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
409 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
410 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
411 if not host or not registrablePart:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
412 return ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
413
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
414 subdomain = self.__normalizedHost(host)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
415
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
416 subdomain = subdomain[:subdomain.rfind(registrablePart)]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
417
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
418 if subdomain.endswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
419 subdomain = subdomain[:-1]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
420
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
421 return subdomain
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
422
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
423 def __normalizedHost(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
424 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
425 Private method to get the normalized host for a host address.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
426
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
427 @param host host address to be normalized
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
428 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
429 @return normalized host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
430 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
431 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
432 return host.lower()
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
433
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
434 #################################################################
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
435 ## Methods below are for testing purposes
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
436 #################################################################
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
437
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
438 def test(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
439 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
440 Public method to execute the tests.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
441
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
442 @return flag indicating the test result
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
443 @rtype bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
444 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
445 self.__withPrivate = True
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
446 self.__loadData()
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
447 if not self.__tldDict:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
448 return False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
449
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
450 testDataFileName = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
451 testDataFileExist = False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
452
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
453 for path in self.__dataSearchPaths:
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
454 testDataFileName = (
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
455 QFileInfo(path + "/test_psl.txt").absoluteFilePath()
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
456 )
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
457 if QFileInfo(testDataFileName).exists():
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
458 testDataFileExist = True
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
459 break
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
460
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
461 if not testDataFileExist:
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
462 testFileDownloadLink = (
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
463 "http://mxr.mozilla.org/mozilla-central/source/netwerk/test/"
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
464 "unit/data/test_psl.txt?raw=1"
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
465 )
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
466 E5MessageBox.information(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
467 None,
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
468 self.tr("TLD Data File not found"),
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
469 self.tr("""<p>The file 'test_psl.txt' was not found!"""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
470 """<br/>You can download it from '<a href="{0}">"""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
471 """<b>here</b></a>' to one of the following"""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
472 """ paths:</p><ul>{1}</ul>""").format(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
473 testFileDownloadLink,
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
474 "".join(["<li>{0}</li>".format(p)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
475 for p in self.__dataSearchPaths]))
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
476 )
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
477 return False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
478
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
479 file = QFile(testDataFileName)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
480
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
481 if not file.open(QFile.ReadOnly | QFile.Text):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
482 return False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
483
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
484 testRegExp = QRegExp(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
485 "checkPublicSuffix\\(('([^']+)'|null), ('([^']+)'|null)\\);")
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
486 allTestSuccess = True
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
487
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
488 while not file.atEnd():
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
489 line = bytes(file.readLine()).decode("utf-8").strip()
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
490 if not line or line.startswith("//"):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
491 continue
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
492
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
493 if testRegExp.indexIn(line) == -1:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
494 allTestSuccess = False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
495 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
496 hostName = testRegExp.cap(2)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
497 registrableName = testRegExp.cap(4)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
498
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
499 if not self.__checkPublicSuffix(hostName, registrableName):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
500 allTestSuccess = False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
501
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
502 if allTestSuccess:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
503 qWarning("E5TldExtractor: Test passed successfully.")
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
504 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
505 qWarning("E5TldExtractor: Test finished with some errors!")
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
506
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
507 # reset the TLD dictionary
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
508 self.__tldDict = collections.defaultdict(list)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
509
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
510 return allTestSuccess
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
511
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
512 def __checkPublicSuffix(self, host, registrableName):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
513 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
514 Private method to test a host name against a registrable name.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
515
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
516 @param host host name to test
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
517 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
518 @param registrableName registrable domain name to test against
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
519 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
520 @return flag indicating the check result
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
521 @rtype bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
522 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
523 regName = self.registrableDomain(host)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
524 if regName != registrableName:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
525 qWarning(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
526 "E5TldExtractor Test Error: hostName: {0}\n"
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
527 " Correct registrableName: {1}\n"
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
528 " Calculated registrableName: {2}".format(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
529 host, registrableName, regName))
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
530 return False
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
531
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
532 return True
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
533
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
534
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
535 _TLDExtractor = None
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
536
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
537
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
538 def instance(withPrivate=False):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
539 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
540 Global function to get a reference to the TLD extractor and create it, if
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
541 it hasn't been yet.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
542
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
543 @param withPrivate flag indicating to load private TLDs as well
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
544 @type bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
545 @return reference to the zoom manager object
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
546 @rtype E5TldExtractor
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
547 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
548 global _TLDExtractor
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
549
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
550 if _TLDExtractor is None:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
551 _TLDExtractor = E5TldExtractor(withPrivate=withPrivate)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
552
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
553 return _TLDExtractor

eric ide

mercurial