src/eric7/EricNetwork/EricTldExtractor.py

Sat, 23 Dec 2023 15:48:12 +0100

author
Detlev Offenbach <detlev@die-offenbachs.de>
date
Sat, 23 Dec 2023 15:48:12 +0100
branch
eric7
changeset 10439
21c28b0f9e41
parent 9653
e67609152c5e
child 10488
1ddd2cb3aebe
permissions
-rw-r--r--

Updated copyright for 2024.

4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
1 # -*- coding: utf-8 -*-
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
2
10439
21c28b0f9e41 Updated copyright for 2024.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9653
diff changeset
3 # Copyright (c) 2016 - 2024 Detlev Offenbach <detlev@die-offenbachs.de>
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
4 #
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
5
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
6 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
7 Module implementing the TLD Extractor.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
8 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
9
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
10 #
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
11 # This is a Python port of the TLDExtractor of Qupzilla
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
12 # Copyright (C) 2014 Razi Alavizadeh <s.r.alavizadeh@gmail.com>
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
13 #
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
14
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
15 import collections
7717
f32d7965a17e Changed the code to not rely on the Qt Resource system anymore (no .qrc files and no use of pyrcc5 anymore).
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7716
diff changeset
16 import os
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
17
9500
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
18 from dataclasses import dataclass
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
19
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
20 from PyQt6.QtCore import QObject, QUrl, qWarning
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
21
9413
80c06d472826 Changed the eric7 import statements to include the package name (i.e. eric7) in order to not fiddle with sys.path.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9221
diff changeset
22 from eric7.EricWidgets import EricMessageBox
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
23
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
24
9500
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
25 @dataclass
8354
12ebd3934fef Renamed 'E5Utilities' to 'EricUtilities' and 'E5Network' to 'EricNetwork'.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8318
diff changeset
26 class EricTldHostParts:
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
27 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
28 Class implementing the host parts helper.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
29 """
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
30
9500
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
31 host: str = ""
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
32 tld: str = ""
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
33 domain: str = ""
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
34 registrableDomain: str = ""
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
35 subdomain: str = ""
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
36
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
37
8354
12ebd3934fef Renamed 'E5Utilities' to 'EricUtilities' and 'E5Network' to 'EricNetwork'.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8318
diff changeset
38 class EricTldExtractor(QObject):
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
39 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
40 Class implementing the TLD Extractor.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
41
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
42 Note: The module function instance() should be used to get a reference
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
43 to a global object to avoid overhead.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
44 """
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
45
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
46 def __init__(self, withPrivate=False, parent=None):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
47 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
48 Constructor
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
49
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
50 @param withPrivate flag indicating to load private TLDs as well
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
51 @type bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
52 @param parent reference to the parent object
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
53 @type QObject
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
54 """
8218
7c09585bd960 Applied some more code simplifications suggested by the new Simplify checker (super(Foo, self) => super()).
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8207
diff changeset
55 super().__init__(parent)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
56
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
57 self.__withPrivate = withPrivate
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
58 self.__dataFileName = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
59 self.__dataSearchPaths = []
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
60
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
61 self.__tldDict = collections.defaultdict(list)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
62 # dict with list of str as values
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
63
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
64 self.setDataSearchPaths()
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
65
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
66 def isDataLoaded(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
67 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
68 Public method to check, if the TLD data ia already loaded.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
69
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
70 @return flag indicating data is loaded
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
71 @rtype bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
72 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
73 return bool(self.__tldDict)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
74
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
75 def tld(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
76 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
77 Public method to get the top level domain for a host.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
78
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
79 @param host host name to get TLD for
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
80 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
81 @return TLD for host
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
82 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
83 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
84 if not host or host.startswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
85 return ""
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
86
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
87 cleanHost = self.__normalizedHost(host)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
88
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
89 tldPart = cleanHost[cleanHost.rfind(".") + 1 :]
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
90 cleanHost = bytes(QUrl.toAce(cleanHost)).decode("utf-8")
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
91
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
92 self.__loadData()
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
93
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
94 if tldPart not in self.__tldDict:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
95 return tldPart
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
96
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
97 tldRules = self.__tldDict[tldPart][:]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
98
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
99 if tldPart not in tldRules:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
100 tldRules.append(tldPart)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
101
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
102 maxLabelCount = 0
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
103 isWildcardTLD = False
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
104
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
105 for rule in tldRules:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
106 labelCount = rule.count(".") + 1
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
107
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
108 if rule.startswith("!"):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
109 rule = rule[1:]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
110
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
111 rule = bytes(QUrl.toAce(rule)).decode("utf-8")
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
112
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
113 # matches with exception TLD
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
114 if cleanHost.endswith(rule):
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
115 tldPart = rule[rule.find(".") + 1 :]
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
116 break
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
117
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
118 if rule.startswith("*"):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
119 rule = rule[1:]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
120
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
121 if rule.startswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
122 rule = rule[1:]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
123
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
124 isWildcardTLD = True
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
125 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
126 isWildcardTLD = False
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
127
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
128 rule = bytes(QUrl.toAce(rule)).decode("utf-8")
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
129 testRule = "." + rule
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
130 testUrl = "." + cleanHost
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
131
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
132 if labelCount > maxLabelCount and testUrl.endswith(testRule):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
133 tldPart = rule
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
134 maxLabelCount = labelCount
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
135
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
136 if isWildcardTLD:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
137 temp = cleanHost
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
138 temp = temp[: temp.rfind(tldPart)]
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
139
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
140 if temp.endswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
141 temp = temp[:-1]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
142
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
143 temp = temp[temp.rfind(".") + 1 :]
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
144
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
145 if temp:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
146 tldPart = temp + "." + rule
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
147 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
148 tldPart = rule
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
149
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
150 temp = self.__normalizedHost(host)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
151 tldPart = ".".join(temp.split(".")[temp.count(".") - tldPart.count(".") :])
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
152
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
153 return tldPart
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
154
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
155 def domain(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
156 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
157 Public method to get the domain for a host.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
158
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
159 @param host host name to get the domain for
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
160 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
161 @return domain for host
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
162 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
163 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
164 tldPart = self.tld(host)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
165
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
166 return self.__domainHelper(host, tldPart)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
167
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
168 def registrableDomain(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
169 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
170 Public method to get the registrable domain for a host.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
171
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
172 @param host host name to get the registrable domain for
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
173 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
174 @return registrable domain for host
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
175 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
176 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
177 tldPart = self.tld(host)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
178
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
179 return self.__registrableDomainHelper(
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
180 self.__domainHelper(host, tldPart), tldPart
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
181 )
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
182
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
183 def subdomain(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
184 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
185 Public method to get the subdomain for a host.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
186
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
187 @param host host name to get the subdomain for
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
188 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
189 @return subdomain for host
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
190 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
191 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
192 return self.__subdomainHelper(host, self.registrableDomain(host))
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
193
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
194 def splitParts(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
195 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
196 Public method to split a host address into its parts.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
197
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
198 @param host host address to be split
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
199 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
200 @return splitted host address
8354
12ebd3934fef Renamed 'E5Utilities' to 'EricUtilities' and 'E5Network' to 'EricNetwork'.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8318
diff changeset
201 @rtype EricTldHostParts
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
202 """
9500
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
203 tld = self.tld(host)
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
204 domain = self.__domainHelper(host, tld)
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
205 registrableDomain = self.__registrableDomainHelper(domain, tld)
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
206 subdomain = self.__subdomainHelper(host, registrableDomain)
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
207
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
208 hostParts = EricTldHostParts(
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
209 host=host,
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
210 tld=tld,
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
211 domain=domain,
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
212 registrableDomain=registrableDomain,
5771348ded12 Corrected some code style issues and changed some suppressed code style issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9413
diff changeset
213 subdomain=subdomain,
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
214 )
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
215
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
216 return hostParts
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
217
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
218 def dataSearchPaths(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
219 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
220 Public method to get the search paths for the TLD data file.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
221
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
222 @return search paths for the TLD data file
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
223 @rtype list of str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
224 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
225 return self.__dataSearchPaths[:]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
226
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
227 def setDataSearchPaths(self, searchPaths=None):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
228 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
229 Public method to set the search paths for the TLD data file.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
230
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
231 @param searchPaths search paths for the TLD data file or None,
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
232 if the default search paths shall be set
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
233 @type list of str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
234 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
235 if searchPaths:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
236 self.__dataSearchPaths = searchPaths[:]
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
237 self.__dataSearchPaths.extend(self.__defaultDataSearchPaths())
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
238 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
239 self.__dataSearchPaths = self.__defaultDataSearchPaths()[:]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
240
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
241 # remove duplicates
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
242 paths = []
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
243 for p in self.__dataSearchPaths:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
244 if p not in paths:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
245 paths.append(p)
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
246 self.__dataSearchPaths = paths
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
247
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
248 def __defaultDataSearchPaths(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
249 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
250 Private method to get the default search paths for the TLD data file.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
251
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
252 @return default search paths for the TLD data file
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
253 @rtype list of str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
254 """
7717
f32d7965a17e Changed the code to not rely on the Qt Resource system anymore (no .qrc files and no use of pyrcc5 anymore).
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7716
diff changeset
255 return [os.path.join(os.path.dirname(__file__), "data")]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
256
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
257 def getTldDownloadUrl(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
258 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
259 Public method to get the TLD data file download URL.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
260
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
261 @return download URL
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
262 @rtype QUrl
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
263 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
264 return QUrl(
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
265 "http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/"
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
266 "effective_tld_names.dat?raw=1"
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
267 )
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
268
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
269 def __loadData(self):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
270 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
271 Private method to load the TLD data.
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
272 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
273 if self.isDataLoaded():
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
274 return
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
275
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
276 dataFileName = ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
277 parsedDataFileExist = False
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
278
9152
8a68afaf1ba2 Started replacing the use of "QFileInfo()" with Python equivalents.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8881
diff changeset
279 for searchPath in self.__dataSearchPaths:
8a68afaf1ba2 Started replacing the use of "QFileInfo()" with Python equivalents.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8881
diff changeset
280 dataFileName = os.path.abspath(
8a68afaf1ba2 Started replacing the use of "QFileInfo()" with Python equivalents.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8881
diff changeset
281 os.path.join(searchPath, "effective_tld_names.dat")
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
282 )
9152
8a68afaf1ba2 Started replacing the use of "QFileInfo()" with Python equivalents.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8881
diff changeset
283 if os.path.exists(dataFileName):
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
284 parsedDataFileExist = True
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
285 break
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
286
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
287 if not parsedDataFileExist:
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
288 tldDataFileDownloadLink = (
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
289 "http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/"
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
290 "effective_tld_names.dat?raw=1"
7253
50dbe65a1334 Continued to resolve code style issue M841.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7229
diff changeset
291 )
8356
68ec9c3d4de5 Renamed the modules and classes of the E5Gui package to have the prefix 'Eric' instead of 'E5'.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8354
diff changeset
292 EricMessageBox.information(
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
293 None,
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
294 self.tr("TLD Data File not found"),
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
295 self.tr(
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
296 """<p>The file 'effective_tld_names.dat' was not"""
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
297 """ found!<br/>You can download it from """
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
298 """'<a href="{0}"><b>here</b></a>' to one of the"""
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
299 """ following paths:</p><ul>{1}</ul>"""
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
300 ).format(
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
301 tldDataFileDownloadLink,
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
302 "".join(["<li>{0}</li>".format(p) for p in self.__dataSearchPaths]),
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
303 ),
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
304 )
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
305 return
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
306
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
307 self.__dataFileName = dataFileName
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
308 if not self.__parseData(dataFileName, loadPrivateDomains=self.__withPrivate):
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
309 qWarning(
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
310 "EricTldExtractor: There are some parse errors for file: {0}".format(
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
311 dataFileName
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
312 )
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
313 )
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
314
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
315 def __parseData(self, dataFile, loadPrivateDomains=False):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
316 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
317 Private method to parse TLD data.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
318
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
319 @param dataFile name of the file containing the TLD data
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
320 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
321 @param loadPrivateDomains flag indicating to load private domains
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
322 @type bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
323 @return flag indicating success
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
324 @rtype bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
325 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
326 # start with a fresh dictionary
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
327 self.__tldDict = collections.defaultdict(list)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
328
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
329 seekToEndOfPrivateDomains = False
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
330
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
331 try:
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
332 with open(dataFile, "r", encoding="utf-8") as f:
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
333 for line in f.readlines():
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
334 if not line:
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
335 continue
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
336
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
337 if line.startswith("."):
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
338 line = line[1:]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
339
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
340 if line.startswith("//"):
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
341 if "===END PRIVATE DOMAINS===" in line:
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
342 seekToEndOfPrivateDomains = False
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
343
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
344 if (
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
345 not loadPrivateDomains
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
346 and "===BEGIN PRIVATE DOMAINS===" in line
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
347 ):
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
348 seekToEndOfPrivateDomains = True
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
349
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
350 continue
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
351
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
352 if seekToEndOfPrivateDomains:
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
353 continue
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
354
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
355 # only data up to the first whitespace is used
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
356 line = line.split(None, 1)[0]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
357
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
358 if "." not in line:
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
359 self.__tldDict[line].append(line)
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
360 else:
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
361 key = line[line.rfind(".") + 1 :]
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
362 self.__tldDict[key].append(line)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
363
9162
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
364 return self.isDataLoaded()
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
365 except OSError:
8b75b1668583 Changed some use of QFile() to a more pythonic solution and fixed a few issues.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9152
diff changeset
366 return False
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
367
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
368 def __domainHelper(self, host, tldPart):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
369 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
370 Private method to get the domain name without TLD.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
371
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
372 @param host host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
373 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
374 @param tldPart TLD part of the host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
375 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
376 @return domain name
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
377 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
378 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
379 if not host or not tldPart:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
380 return ""
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
381
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
382 temp = self.__normalizedHost(host)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
383 temp = temp[: temp.rfind(tldPart)]
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
384
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
385 if temp.endswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
386 temp = temp[:-1]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
387
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
388 return temp[temp.rfind(".") + 1 :]
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
389
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
390 def __registrableDomainHelper(self, domainPart, tldPart):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
391 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
392 Private method to get the registrable domain (i.e. domain plus TLD).
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
393
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
394 @param domainPart domain part of a host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
395 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
396 @param tldPart TLD part of a host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
397 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
398 @return registrable domain name
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
399 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
400 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
401 if not tldPart or not domainPart:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
402 return ""
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
403 else:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
404 return "{0}.{1}".format(domainPart, tldPart)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
405
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
406 def __subdomainHelper(self, host, registrablePart):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
407 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
408 Private method to get the subdomain of a host address (i.e. domain part
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
409 without the registrable domain name).
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
410
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
411 @param host host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
412 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
413 @param registrablePart registrable domain part of the host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
414 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
415 @return subdomain name
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
416 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
417 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
418 if not host or not registrablePart:
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
419 return ""
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
420
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
421 subdomain = self.__normalizedHost(host)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
422
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
423 subdomain = subdomain[: subdomain.rfind(registrablePart)]
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
424
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
425 if subdomain.endswith("."):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
426 subdomain = subdomain[:-1]
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
427
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
428 return subdomain
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
429
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
430 def __normalizedHost(self, host):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
431 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
432 Private method to get the normalized host for a host address.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
433
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
434 @param host host address to be normalized
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
435 @type str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
436 @return normalized host address
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
437 @rtype str
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
438 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
439 return host.lower()
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
440
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
441
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
442 _TLDExtractor = None
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
443
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
444
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
445 def instance(withPrivate=False):
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
446 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
447 Global function to get a reference to the TLD extractor and create it, if
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
448 it hasn't been yet.
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
449
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
450 @param withPrivate flag indicating to load private TLDs as well
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
451 @type bool
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
452 @return reference to the zoom manager object
8354
12ebd3934fef Renamed 'E5Utilities' to 'EricUtilities' and 'E5Network' to 'EricNetwork'.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8318
diff changeset
453 @rtype EricTldExtractor
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
454 """
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
455 global _TLDExtractor
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
456
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
457 if _TLDExtractor is None:
8354
12ebd3934fef Renamed 'E5Utilities' to 'EricUtilities' and 'E5Network' to 'EricNetwork'.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8318
diff changeset
458 _TLDExtractor = EricTldExtractor(withPrivate=withPrivate)
9221
bf71ee032bb4 Reformatted the source code using the 'Black' utility.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 9209
diff changeset
459
4971
0f21662c0d2d Added a class to extract all top level domains from a file provided by Mozilla.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
460 return _TLDExtractor

eric ide

mercurial