Fri, 04 Aug 2017 18:38:45 +0200
Finished coding the safe browsing module of the new web browser.
5808
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
1 | # -*- coding: utf-8 -*- |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
2 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
3 | # Copyright (c) 2017 Detlev Offenbach <detlev@die-offenbachs.de> |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
4 | # |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
5 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
6 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
7 | Module implementing an URL representation suitable for Google Safe Browsing. |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
8 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
9 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
10 | from __future__ import unicode_literals |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
11 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
12 | try: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
13 | import urlparse # Py2 |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
14 | import urllib # Py2 |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
15 | except ImportError: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
16 | import urllib.parse as urllib |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
17 | from urllib import parse as urlparse |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
18 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
19 | import re |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
20 | import posixpath |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
21 | import socket |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
22 | import struct |
5809
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
23 | import hashlib |
5808
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
24 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
25 | import Preferences |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
26 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
27 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
28 | class SafeBrowsingUrl(object): |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
29 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
30 | Class implementing an URL representation suitable for Google Safe Browsing. |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
31 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
32 | # |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
33 | # Modeled after the URL class of the gglsbl package. |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
34 | # https://github.com/afilipovich/gglsbl |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
35 | # |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
36 | def __init__(self, url): |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
37 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
38 | Constructor |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
39 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
40 | @param url URL to be embedded |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
41 | @type str |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
42 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
43 | self.__url = url |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
44 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
45 | def hashes(self): |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
46 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
47 | Public method to get the hashes of all possible permutations of the URL |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
48 | in canonical form. |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
49 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
50 | @return generator for the URL hashes |
5817
a5f6c9128500
Started implementing the SafeBrowsingCache class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5811
diff
changeset
|
51 | @rtype generator of bytes |
5808
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
52 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
53 | for variant in self.permutations(self.canonical()): |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
54 | urlHash = self.digest(variant) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
55 | yield urlHash |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
56 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
57 | def canonical(self): |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
58 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
59 | Public method to convert the URL to the canonical form. |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
60 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
61 | @return canonical form of the URL |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
62 | @rtype str |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
63 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
64 | def fullUnescape(u): |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
65 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
66 | Method to recursively unescape an URL. |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
67 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
68 | @param u URL string to unescape |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
69 | @type str |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
70 | @return unescaped URL string |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
71 | @rtype str |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
72 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
73 | uu = urllib.unquote(u) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
74 | if uu == u: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
75 | return uu |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
76 | else: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
77 | return fullUnescape(uu) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
78 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
79 | def quote(s): |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
80 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
81 | Method to quote a string. |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
82 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
83 | @param string to be quoted |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
84 | @type str |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
85 | @return quoted string |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
86 | @rtype str |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
87 | """ |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
88 | safeChars = '!"$&\'()*+,-./:;<=>?@[\\]^_`{|}~' |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
89 | return urllib.quote(s, safe=safeChars) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
90 | |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
91 | url = self.__url.strip() |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
92 | url = url.replace('\n', '').replace('\r', '').replace('\t', '') |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
93 | url = url.split('#', 1)[0] |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
94 | if url.startswith('//'): |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
95 | url = Preferences.getWebBrowser("DefaultScheme")[:-3] + url |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
96 | if len(url.split('://')) <= 1: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
97 | url = Preferences.getWebBrowser("DefaultScheme") + url |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
98 | url = quote(fullUnescape(url)) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
99 | urlParts = urlparse.urlsplit(url) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
100 | if not urlParts[0]: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
101 | url = Preferences.getWebBrowser("DefaultScheme") + url |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
102 | urlParts = urlparse.urlsplit(url) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
103 | protocol = urlParts.scheme |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
104 | host = fullUnescape(urlParts.hostname) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
105 | path = fullUnescape(urlParts.path) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
106 | query = urlParts.query |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
107 | if not query and '?' not in url: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
108 | query = None |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
109 | if not path: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
110 | path = '/' |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
111 | path = posixpath.normpath(path).replace('//', '/') |
5829
d3448873ced3
Finished coding the safe browsing module of the new web browser.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5817
diff
changeset
|
112 | if path[-1] != '/': |
5808
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
113 | path += '/' |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
114 | port = urlParts.port |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
115 | host = host.strip('.') |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
116 | host = re.sub(r'\.+', '.', host).lower() |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
117 | if host.isdigit(): |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
118 | try: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
119 | host = socket.inet_ntoa(struct.pack("!I", int(host))) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
120 | except Exception: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
121 | pass |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
122 | if host.startswith('0x') and '.' not in host: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
123 | try: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
124 | host = socket.inet_ntoa(struct.pack("!I", int(host, 16))) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
125 | except Exception: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
126 | pass |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
127 | quotedPath = quote(path) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
128 | quotedHost = quote(host) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
129 | if port is not None: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
130 | quotedHost = '{0}:{1}'.format(quotedHost, port) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
131 | canonicalUrl = '{0}://{1}{2}'.format(protocol, quotedHost, quotedPath) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
132 | if query is not None: |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
133 | canonicalUrl = '{0}?{1}'.format(canonicalUrl, query) |
7bf90dcae4e1
Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
134 | return canonicalUrl |
5809
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
135 | |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
136 | @staticmethod |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
137 | def permutations(url): |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
138 | """ |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
139 | Static method to determine all permutations of host name and path |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
140 | which can be applied to blacklisted URLs. |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
141 | |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
142 | @param url URL string to be permuted |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
143 | @type str |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
144 | @return generator of permuted URL strings |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
145 | @type generator of str |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
146 | """ |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
147 | def hostPermutations(host): |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
148 | """ |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
149 | Method to generate the permutations of the host name. |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
150 | |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
151 | @param host host name |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
152 | @type str |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
153 | @return generator of permuted host names |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
154 | @rtype generator of str |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
155 | """ |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
156 | if re.match(r'\d+\.\d+\.\d+\.\d+', host): |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
157 | yield host |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
158 | return |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
159 | parts = host.split('.') |
5811
5358a3c7995f
Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5809
diff
changeset
|
160 | partsLen = min(len(parts), 5) |
5358a3c7995f
Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5809
diff
changeset
|
161 | if partsLen > 4: |
5809
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
162 | yield host |
5811
5358a3c7995f
Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5809
diff
changeset
|
163 | for i in range(partsLen - 1): |
5358a3c7995f
Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5809
diff
changeset
|
164 | yield '.'.join(parts[i - partsLen:]) |
5809
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
165 | |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
166 | def pathPermutations(path): |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
167 | """ |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
168 | Method to generate the permutations of the path. |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
169 | |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
170 | @param path path to be processed |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
171 | @type str |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
172 | @return generator of permuted paths |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
173 | @rtype generator of str |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
174 | """ |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
175 | yield path |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
176 | query = None |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
177 | if '?' in path: |
5811
5358a3c7995f
Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5809
diff
changeset
|
178 | path, query = path.split('?', 1) |
5809
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
179 | if query is not None: |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
180 | yield path |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
181 | pathParts = path.split('/')[0:-1] |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
182 | curPath = '' |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
183 | for i in range(min(4, len(pathParts))): |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
184 | curPath = curPath + pathParts[i] + '/' |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
185 | yield curPath |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
186 | |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
187 | protocol, addressStr = urllib.splittype(url) |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
188 | host, path = urllib.splithost(addressStr) |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
189 | user, host = urllib.splituser(host) |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
190 | host, port = urllib.splitport(host) |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
191 | host = host.strip('/') |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
192 | seenPermutations = set() |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
193 | for h in hostPermutations(host): |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
194 | for p in pathPermutations(path): |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
195 | u = '{0}{1}'.format(h, p) |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
196 | if u not in seenPermutations: |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
197 | yield u |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
198 | seenPermutations.add(u) |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
199 | |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
200 | @staticmethod |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
201 | def digest(url): |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
202 | """ |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
203 | Static method to calculate the SHA256 digest of an URL string. |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
204 | |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
205 | @param url URL string |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
206 | @type str |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
207 | @return SHA256 digest of the URL string |
5817
a5f6c9128500
Started implementing the SafeBrowsingCache class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5811
diff
changeset
|
208 | @rtype bytes |
5809
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
209 | """ |
5b53c17b7d93
Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
5808
diff
changeset
|
210 | return hashlib.sha256(url.encode('utf-8')).digest() |