eric6/WebBrowser/SafeBrowsing/SafeBrowsingUrl.py

Sat, 31 Aug 2019 12:58:11 +0200

author
Detlev Offenbach <detlev@die-offenbachs.de>
date
Sat, 31 Aug 2019 12:58:11 +0200
branch
without_py2_and_pyqt4
changeset 7192
a22eee00b052
parent 6942
2602857055c5
child 7229
53054eb5b15a
permissions
-rw-r--r--

Started removing runtime support for Python2 and PyQt4.

5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
1 # -*- coding: utf-8 -*-
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
2
6645
ad476851d7e0 Updated copyright for 2019.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6048
diff changeset
3 # Copyright (c) 2017 - 2019 Detlev Offenbach <detlev@die-offenbachs.de>
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
4 #
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
5
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
6 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
7 Module implementing an URL representation suitable for Google Safe Browsing.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
8 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
9
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
10 from __future__ import unicode_literals
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
11
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
12 import re
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
13 import posixpath
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
14 import socket
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
15 import struct
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
16 import hashlib
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
17 import urllib.parse
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
18
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
19 import Preferences
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
20
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
21
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
22 class SafeBrowsingUrl(object):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
23 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
24 Class implementing an URL representation suitable for Google Safe Browsing.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
25 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
26 #
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
27 # Modeled after the URL class of the gglsbl package.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
28 # https://github.com/afilipovich/gglsbl
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
29 #
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
30 def __init__(self, url):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
31 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
32 Constructor
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
33
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
34 @param url URL to be embedded
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
35 @type str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
36 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
37 self.__url = url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
38
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
39 def hashes(self):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
40 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
41 Public method to get the hashes of all possible permutations of the URL
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
42 in canonical form.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
43
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
44 @return generator for the URL hashes
5817
a5f6c9128500 Started implementing the SafeBrowsingCache class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5811
diff changeset
45 @rtype generator of bytes
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
46 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
47 for variant in self.permutations(self.canonical()):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
48 urlHash = self.digest(variant)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
49 yield urlHash
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
50
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
51 def canonical(self):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
52 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
53 Public method to convert the URL to the canonical form.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
54
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
55 @return canonical form of the URL
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
56 @rtype str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
57 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
58 def fullUnescape(u):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
59 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
60 Method to recursively unescape an URL.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
61
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
62 @param u URL string to unescape
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
63 @type str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
64 @return unescaped URL string
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
65 @rtype str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
66 """
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
67 uu = urllib.parse.unquote(u)
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
68 if uu == u:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
69 return uu
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
70 else:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
71 return fullUnescape(uu)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
72
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
73 def quote(s):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
74 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
75 Method to quote a string.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
76
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
77 @param string to be quoted
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
78 @type str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
79 @return quoted string
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
80 @rtype str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
81 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
82 safeChars = '!"$&\'()*+,-./:;<=>?@[\\]^_`{|}~'
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
83 return urllib.parse.quote(s, safe=safeChars)
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
84
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
85 url = self.__url.strip()
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
86 url = url.replace('\n', '').replace('\r', '').replace('\t', '')
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
87 url = url.split('#', 1)[0]
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
88 if url.startswith('//'):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
89 url = Preferences.getWebBrowser("DefaultScheme")[:-3] + url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
90 if len(url.split('://')) <= 1:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
91 url = Preferences.getWebBrowser("DefaultScheme") + url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
92 url = quote(fullUnescape(url))
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
93 urlParts = urllib.parse.parse.urlsplit(url)
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
94 if not urlParts[0]:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
95 url = Preferences.getWebBrowser("DefaultScheme") + url
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
96 urlParts = urllib.parse.parse.urlsplit(url)
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
97 protocol = urlParts.scheme
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
98 host = fullUnescape(urlParts.hostname)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
99 path = fullUnescape(urlParts.path)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
100 query = urlParts.query
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
101 if not query and '?' not in url:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
102 query = None
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
103 if not path:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
104 path = '/'
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
105 path = posixpath.normpath(path).replace('//', '/')
5829
d3448873ced3 Finished coding the safe browsing module of the new web browser.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5817
diff changeset
106 if path[-1] != '/':
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
107 path += '/'
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
108 port = urlParts.port
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
109 host = host.strip('.')
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
110 host = re.sub(r'\.+', '.', host).lower()
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
111 if host.isdigit():
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
112 try:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
113 host = socket.inet_ntoa(struct.pack("!I", int(host)))
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
114 except Exception:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
115 pass
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
116 if host.startswith('0x') and '.' not in host:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
117 try:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
118 host = socket.inet_ntoa(struct.pack("!I", int(host, 16)))
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
119 except Exception:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
120 pass
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
121 quotedPath = quote(path)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
122 quotedHost = quote(host)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
123 if port is not None:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
124 quotedHost = '{0}:{1}'.format(quotedHost, port)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
125 canonicalUrl = '{0}://{1}{2}'.format(protocol, quotedHost, quotedPath)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
126 if query is not None:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
127 canonicalUrl = '{0}?{1}'.format(canonicalUrl, query)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
128 return canonicalUrl
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
129
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
130 @staticmethod
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
131 def permutations(url):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
132 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
133 Static method to determine all permutations of host name and path
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
134 which can be applied to blacklisted URLs.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
135
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
136 @param url URL string to be permuted
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
137 @type str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
138 @return generator of permuted URL strings
5832
28f36b9c925f Updated source docu.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5829
diff changeset
139 @rtype generator of str
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
140 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
141 def hostPermutations(host):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
142 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
143 Method to generate the permutations of the host name.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
144
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
145 @param host host name
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
146 @type str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
147 @return generator of permuted host names
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
148 @rtype generator of str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
149 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
150 if re.match(r'\d+\.\d+\.\d+\.\d+', host):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
151 yield host
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
152 return
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
153 parts = host.split('.')
5811
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
154 partsLen = min(len(parts), 5)
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
155 if partsLen > 4:
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
156 yield host
5811
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
157 for i in range(partsLen - 1):
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
158 yield '.'.join(parts[i - partsLen:])
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
159
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
160 def pathPermutations(path):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
161 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
162 Method to generate the permutations of the path.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
163
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
164 @param path path to be processed
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
165 @type str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
166 @return generator of permuted paths
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
167 @rtype generator of str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
168 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
169 yield path
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
170 query = None
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
171 if '?' in path:
5811
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
172 path, query = path.split('?', 1)
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
173 if query is not None:
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
174 yield path
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
175 pathParts = path.split('/')[0:-1]
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
176 curPath = ''
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
177 for i in range(min(4, len(pathParts))):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
178 curPath = curPath + pathParts[i] + '/'
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
179 yield curPath
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
180
7192
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
181 protocol, addressStr = urllib.parse.splittype(url)
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
182 host, path = urllib.parse.splithost(addressStr)
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
183 user, host = urllib.parse.splituser(host)
a22eee00b052 Started removing runtime support for Python2 and PyQt4.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 6942
diff changeset
184 host, port = urllib.parse.splitport(host)
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
185 host = host.strip('/')
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
186 seenPermutations = set()
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
187 for h in hostPermutations(host):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
188 for p in pathPermutations(path):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
189 u = '{0}{1}'.format(h, p)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
190 if u not in seenPermutations:
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
191 yield u
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
192 seenPermutations.add(u)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
193
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
194 @staticmethod
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
195 def digest(url):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
196 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
197 Static method to calculate the SHA256 digest of an URL string.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
198
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
199 @param url URL string
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
200 @type str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
201 @return SHA256 digest of the URL string
5817
a5f6c9128500 Started implementing the SafeBrowsingCache class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5811
diff changeset
202 @rtype bytes
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
203 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
204 return hashlib.sha256(url.encode('utf-8')).digest()

eric ide

mercurial