WebBrowser/SafeBrowsing/SafeBrowsingUrl.py

Mon, 24 Jul 2017 18:40:07 +0200

author
Detlev Offenbach <detlev@die-offenbachs.de>
date
Mon, 24 Jul 2017 18:40:07 +0200
branch
safe_browsing
changeset 5817
a5f6c9128500
parent 5811
5358a3c7995f
child 5829
d3448873ced3
permissions
-rw-r--r--

Started implementing the SafeBrowsingCache class.

5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
1 # -*- coding: utf-8 -*-
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
2
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
3 # Copyright (c) 2017 Detlev Offenbach <detlev@die-offenbachs.de>
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
4 #
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
5
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
6 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
7 Module implementing an URL representation suitable for Google Safe Browsing.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
8 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
9
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
10 from __future__ import unicode_literals
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
11
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
12 try:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
13 import urlparse # Py2
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
14 import urllib # Py2
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
15 except ImportError:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
16 import urllib.parse as urllib
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
17 from urllib import parse as urlparse
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
18
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
19 import re
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
20 import posixpath
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
21 import socket
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
22 import struct
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
23 import hashlib
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
24
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
25 import Preferences
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
26
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
27
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
28 class SafeBrowsingUrl(object):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
29 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
30 Class implementing an URL representation suitable for Google Safe Browsing.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
31 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
32 #
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
33 # Modeled after the URL class of the gglsbl package.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
34 # https://github.com/afilipovich/gglsbl
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
35 #
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
36 def __init__(self, url):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
37 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
38 Constructor
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
39
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
40 @param url URL to be embedded
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
41 @type str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
42 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
43 self.__url = url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
44
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
45 def hashes(self):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
46 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
47 Public method to get the hashes of all possible permutations of the URL
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
48 in canonical form.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
49
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
50 @return generator for the URL hashes
5817
a5f6c9128500 Started implementing the SafeBrowsingCache class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5811
diff changeset
51 @rtype generator of bytes
5808
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
52 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
53 for variant in self.permutations(self.canonical()):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
54 urlHash = self.digest(variant)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
55 yield urlHash
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
56
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
57 def canonical(self):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
58 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
59 Public method to convert the URL to the canonical form.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
60
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
61 @return canonical form of the URL
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
62 @rtype str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
63 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
64 def fullUnescape(u):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
65 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
66 Method to recursively unescape an URL.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
67
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
68 @param u URL string to unescape
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
69 @type str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
70 @return unescaped URL string
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
71 @rtype str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
72 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
73 uu = urllib.unquote(u)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
74 if uu == u:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
75 return uu
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
76 else:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
77 return fullUnescape(uu)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
78
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
79 def quote(s):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
80 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
81 Method to quote a string.
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
82
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
83 @param string to be quoted
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
84 @type str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
85 @return quoted string
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
86 @rtype str
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
87 """
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
88 safeChars = '!"$&\'()*+,-./:;<=>?@[\\]^_`{|}~'
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
89 return urllib.quote(s, safe=safeChars)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
90
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
91 url = self.__url.strip()
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
92 url = url.replace('\n', '').replace('\r', '').replace('\t', '')
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
93 url = url.split('#', 1)[0]
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
94 if url.startswith('//'):
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
95 url = Preferences.getWebBrowser("DefaultScheme")[:-3] + url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
96 if len(url.split('://')) <= 1:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
97 url = Preferences.getWebBrowser("DefaultScheme") + url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
98 url = quote(fullUnescape(url))
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
99 urlParts = urlparse.urlsplit(url)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
100 if not urlParts[0]:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
101 url = Preferences.getWebBrowser("DefaultScheme") + url
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
102 urlParts = urlparse.urlsplit(url)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
103 protocol = urlParts.scheme
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
104 host = fullUnescape(urlParts.hostname)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
105 path = fullUnescape(urlParts.path)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
106 query = urlParts.query
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
107 if not query and '?' not in url:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
108 query = None
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
109 if not path:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
110 path = '/'
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
111 hasTrailingSlash = (path[-1] == '/')
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
112 path = posixpath.normpath(path).replace('//', '/')
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
113 if hasTrailingSlash and path[-1] != '/':
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
114 path += '/'
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
115 port = urlParts.port
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
116 host = host.strip('.')
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
117 host = re.sub(r'\.+', '.', host).lower()
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
118 if host.isdigit():
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
119 try:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
120 host = socket.inet_ntoa(struct.pack("!I", int(host)))
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
121 except Exception:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
122 pass
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
123 if host.startswith('0x') and '.' not in host:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
124 try:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
125 host = socket.inet_ntoa(struct.pack("!I", int(host, 16)))
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
126 except Exception:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
127 pass
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
128 quotedPath = quote(path)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
129 quotedHost = quote(host)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
130 if port is not None:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
131 quotedHost = '{0}:{1}'.format(quotedHost, port)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
132 canonicalUrl = '{0}://{1}{2}'.format(protocol, quotedHost, quotedPath)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
133 if query is not None:
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
134 canonicalUrl = '{0}?{1}'.format(canonicalUrl, query)
7bf90dcae4e1 Started implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff changeset
135 return canonicalUrl
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
136
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
137 @staticmethod
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
138 def permutations(url):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
139 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
140 Static method to determine all permutations of host name and path
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
141 which can be applied to blacklisted URLs.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
142
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
143 @param url URL string to be permuted
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
144 @type str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
145 @return generator of permuted URL strings
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
146 @type generator of str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
147 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
148 def hostPermutations(host):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
149 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
150 Method to generate the permutations of the host name.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
151
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
152 @param host host name
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
153 @type str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
154 @return generator of permuted host names
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
155 @rtype generator of str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
156 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
157 if re.match(r'\d+\.\d+\.\d+\.\d+', host):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
158 yield host
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
159 return
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
160 parts = host.split('.')
5811
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
161 partsLen = min(len(parts), 5)
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
162 if partsLen > 4:
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
163 yield host
5811
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
164 for i in range(partsLen - 1):
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
165 yield '.'.join(parts[i - partsLen:])
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
166
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
167 def pathPermutations(path):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
168 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
169 Method to generate the permutations of the path.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
170
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
171 @param path path to be processed
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
172 @type str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
173 @return generator of permuted paths
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
174 @rtype generator of str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
175 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
176 yield path
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
177 query = None
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
178 if '?' in path:
5811
5358a3c7995f Done implementing the SafeBrowsingAPIClient class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5809
diff changeset
179 path, query = path.split('?', 1)
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
180 if query is not None:
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
181 yield path
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
182 pathParts = path.split('/')[0:-1]
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
183 curPath = ''
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
184 for i in range(min(4, len(pathParts))):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
185 curPath = curPath + pathParts[i] + '/'
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
186 yield curPath
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
187
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
188 protocol, addressStr = urllib.splittype(url)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
189 host, path = urllib.splithost(addressStr)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
190 user, host = urllib.splituser(host)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
191 host, port = urllib.splitport(host)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
192 host = host.strip('/')
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
193 seenPermutations = set()
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
194 for h in hostPermutations(host):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
195 for p in pathPermutations(path):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
196 u = '{0}{1}'.format(h, p)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
197 if u not in seenPermutations:
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
198 yield u
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
199 seenPermutations.add(u)
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
200
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
201 @staticmethod
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
202 def digest(url):
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
203 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
204 Static method to calculate the SHA256 digest of an URL string.
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
205
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
206 @param url URL string
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
207 @type str
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
208 @return SHA256 digest of the URL string
5817
a5f6c9128500 Started implementing the SafeBrowsingCache class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5811
diff changeset
209 @rtype bytes
5809
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
210 """
5b53c17b7d93 Done implementing the SafeBrowsingUrl class.
Detlev Offenbach <detlev@die-offenbachs.de>
parents: 5808
diff changeset
211 return hashlib.sha256(url.encode('utf-8')).digest()

eric ide

mercurial