eric6/WebBrowser/SafeBrowsing/SafeBrowsingUrl.py

Wed, 14 Apr 2021 19:59:16 +0200

author
Detlev Offenbach <detlev@die-offenbachs.de>
date
Wed, 14 Apr 2021 19:59:16 +0200
changeset 8240
93b8a353c4bf
parent 8207
d359172d11be
permissions
-rw-r--r--

Applied some more code simplifications suggested by the new Simplify checker (Y105: use contextlib.suppress) (batch 1).

5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
1 # -*- coding: utf-8 -*-
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
2
7923
91e843545d9a Updated copyright for 2021.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7781
diff changeset
3 # Copyright (c) 2017 - 2021 Detlev Offenbach <detlev@die-offenbachs.de>
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
4 #
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
5
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
6 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
7 Module implementing an URL representation suitable for Google Safe Browsing.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
8 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
9
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
10 import re
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
11 import posixpath
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
12 import socket
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
13 import struct
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
14 import hashlib
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
15 import urllib.parse
8240
93b8a353c4bf Applied some more code simplifications suggested by the new Simplify checker (Y105: use contextlib.suppress) (batch 1).
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8207
diff changeset
16 import contextlib
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
17
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
18 import Preferences
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
19
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
20
8207
d359172d11be Applied some more code simplifications suggested by the new Simplify checker.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7988
diff changeset
21 class SafeBrowsingUrl:
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
22 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
23 Class implementing an URL representation suitable for Google Safe Browsing.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
24 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
25 #
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
26 # Modeled after the URL class of the gglsbl package.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
27 # https://github.com/afilipovich/gglsbl
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
28 #
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
29 def __init__(self, url):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
30 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
31 Constructor
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
32
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
33 @param url URL to be embedded
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
34 @type str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
35 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
36 self.__url = url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
37
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
38 def hashes(self):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
39 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
40 Public method to get the hashes of all possible permutations of the URL
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
41 in canonical form.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
42
7988
c4c17121eff8 Updated source code documentation with the new tags.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7923
diff changeset
43 @yield URL hashes
c4c17121eff8 Updated source code documentation with the new tags.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7923
diff changeset
44 @ytype bytes
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
45 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
46 for variant in self.permutations(self.canonical()):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
47 urlHash = self.digest(variant)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
48 yield urlHash
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
49
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
50 def canonical(self):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
51 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
52 Public method to convert the URL to the canonical form.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
53
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
54 @return canonical form of the URL
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
55 @rtype str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
56 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
57 def fullUnescape(u):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
58 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
59 Method to recursively unescape an URL.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
60
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
61 @param u URL string to unescape
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
62 @type str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
63 @return unescaped URL string
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
64 @rtype str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
65 """
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
66 uu = urllib.parse.unquote(u)
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
67 if uu == u:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
68 return uu
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
69 else:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
70 return fullUnescape(uu)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
71
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
72 def quote(s):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
73 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
74 Method to quote a string.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
75
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
76 @param string to be quoted
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
77 @type str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
78 @return quoted string
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
79 @rtype str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
80 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
81 safeChars = '!"$&\'()*+,-./:;<=>?@[\\]^_`{|}~'
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
82 return urllib.parse.quote(s, safe=safeChars)
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
83
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
84 url = self.__url.strip()
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
85 url = url.replace('\n', '').replace('\r', '').replace('\t', '')
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
86 url = url.split('#', 1)[0]
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
87 if url.startswith('//'):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
88 url = Preferences.getWebBrowser("DefaultScheme")[:-3] + url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
89 if len(url.split('://')) <= 1:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
90 url = Preferences.getWebBrowser("DefaultScheme") + url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
91 url = quote(fullUnescape(url))
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
92 urlParts = urllib.parse.parse.urlsplit(url)
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
93 if not urlParts[0]:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
94 url = Preferences.getWebBrowser("DefaultScheme") + url
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
95 urlParts = urllib.parse.parse.urlsplit(url)
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
96 protocol = urlParts.scheme
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
97 host = fullUnescape(urlParts.hostname)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
98 path = fullUnescape(urlParts.path)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
99 query = urlParts.query
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
100 if not query and '?' not in url:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
101 query = None
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
102 if not path:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
103 path = '/'
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
104 path = posixpath.normpath(path).replace('//', '/')
5829
d3448873ced3 Finished coding the safe browsing module of the new web browser.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5817
diff changeset
105 if path[-1] != '/':
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
106 path += '/'
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
107 port = urlParts.port
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
108 host = host.strip('.')
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
109 host = re.sub(r'\.+', '.', host).lower()
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
110 if host.isdigit():
8240
93b8a353c4bf Applied some more code simplifications suggested by the new Simplify checker (Y105: use contextlib.suppress) (batch 1).
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8207
diff changeset
111 with contextlib.suppress(Exception):
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
112 host = socket.inet_ntoa(struct.pack("!I", int(host)))
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
113 if host.startswith('0x') and '.' not in host:
8240
93b8a353c4bf Applied some more code simplifications suggested by the new Simplify checker (Y105: use contextlib.suppress) (batch 1).
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 8207
diff changeset
114 with contextlib.suppress(Exception):
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
115 host = socket.inet_ntoa(struct.pack("!I", int(host, 16)))
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
116 quotedPath = quote(path)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
117 quotedHost = quote(host)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
118 if port is not None:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
119 quotedHost = '{0}:{1}'.format(quotedHost, port)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
120 canonicalUrl = '{0}://{1}{2}'.format(protocol, quotedHost, quotedPath)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
121 if query is not None:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
122 canonicalUrl = '{0}?{1}'.format(canonicalUrl, query)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
123 return canonicalUrl
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
124
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
125 @staticmethod
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
126 def permutations(url):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
127 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
128 Static method to determine all permutations of host name and path
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
129 which can be applied to blacklisted URLs.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
130
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
131 @param url URL string to be permuted
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
132 @type str
7988
c4c17121eff8 Updated source code documentation with the new tags.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7923
diff changeset
133 @yield permutated URL strings
c4c17121eff8 Updated source code documentation with the new tags.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7923
diff changeset
134 @ytype str
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
135 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
136 def hostPermutations(host):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
137 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
138 Method to generate the permutations of the host name.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
139
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
140 @param host host name
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
141 @type str
7988
c4c17121eff8 Updated source code documentation with the new tags.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7923
diff changeset
142 @yield permutated host names
c4c17121eff8 Updated source code documentation with the new tags.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7923
diff changeset
143 @ytype str
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
144 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
145 if re.match(r'\d+\.\d+\.\d+\.\d+', host):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
146 yield host
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
147 return
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
148 parts = host.split('.')
5811
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
149 partsLen = min(len(parts), 5)
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
150 if partsLen > 4:
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
151 yield host
5811
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
152 for i in range(partsLen - 1):
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
153 yield '.'.join(parts[i - partsLen:])
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
154
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
155 def pathPermutations(path):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
156 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
157 Method to generate the permutations of the path.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
158
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
159 @param path path to be processed
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
160 @type str
7988
c4c17121eff8 Updated source code documentation with the new tags.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7923
diff changeset
161 @yield permutated paths
c4c17121eff8 Updated source code documentation with the new tags.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 7923
diff changeset
162 @ytype str
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
163 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
164 yield path
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
165 query = None
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
166 if '?' in path:
5811
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
167 path, query = path.split('?', 1)
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
168 if query is not None:
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
169 yield path
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
170 pathParts = path.split('/')[0:-1]
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
171 curPath = ''
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
172 for i in range(min(4, len(pathParts))):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
173 curPath = curPath + pathParts[i] + '/'
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
174 yield curPath
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
175
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
176 protocol, addressStr = urllib.parse.splittype(url)
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
177 host, path = urllib.parse.splithost(addressStr)
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
178 user, host = urllib.parse.splituser(host)
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
179 host, port = urllib.parse.splitport(host)
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
180 host = host.strip('/')
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
181 seenPermutations = set()
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
182 for h in hostPermutations(host):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
183 for p in pathPermutations(path):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
184 u = '{0}{1}'.format(h, p)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
185 if u not in seenPermutations:
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
186 yield u
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
187 seenPermutations.add(u)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
188
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
189 @staticmethod
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
190 def digest(url):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
191 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
192 Static method to calculate the SHA256 digest of an URL string.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
193
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
194 @param url URL string
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
195 @type str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
196 @return SHA256 digest of the URL string
5817
a5f6c9128500 Started implementing the SafeBrowsingCache class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5811
diff changeset
197 @rtype bytes
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
198 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
199 return hashlib.sha256(url.encode('utf-8')).digest()

eric ide

mercurial