Thu, 01 Jan 2015 13:27:03 +0100
Updated copyright for 2015.
3
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
1 | # -*- coding: utf-8 -*- |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
2 | |
4
6438afaad632
Updated copyright for 2015.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
3
diff
changeset
|
3 | # Copyright (c) 2014 - 2015 Detlev Offenbach <detlev@die-offenbachs.de> |
3
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
4 | # |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
5 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
6 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
7 | Module implementing the HTML5 to JavaScript converter. |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
8 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
9 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
10 | from __future__ import unicode_literals |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
11 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
12 | import os |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
13 | import re |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
14 | import datetime |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
15 | import getpass |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
16 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
17 | from PyQt5.QtCore import QObject |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
18 | from PyQt5.QtWidgets import QDialog |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
19 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
20 | from .Html5ToJsConverterParameterDialog import \ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
21 | Html5ToJsConverterParameterDialog |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
22 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
23 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
24 | class Html5ToJsConverter(QObject): |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
25 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
26 | Class implementing the HTML5 to JavaScript converter. |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
27 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
28 | JsTemplate8 = "{0}{1}{2}{3}{4}{5}{6}{7}" |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
29 | TagsToIgnore = ('head', 'meta', 'noscript', 'script', 'style', 'link', |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
30 | 'no-js', 'title', 'object', 'col', 'colgroup', 'option', |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
31 | 'param', 'audio', 'basefont', 'isindex', 'svg', 'area', |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
32 | 'embed', 'br') |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
33 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
34 | def __init__(self, html, parent=None): |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
35 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
36 | Constructor |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
37 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
38 | @param html HTML text to be converted (string) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
39 | @param parent reference to the parent object (QObject) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
40 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
41 | super(Html5ToJsConverter, self).__init__(parent) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
42 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
43 | self.__html = html |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
44 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
45 | def getJavaScript(self): |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
46 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
47 | Public method to get the converted JavaScript text. |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
48 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
49 | @return JavaScript text (string) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
50 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
51 | dlg = Html5ToJsConverterParameterDialog() |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
52 | if dlg.exec_() == QDialog.Accepted: |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
53 | indentation, scriptTags = dlg.getData() |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
54 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
55 | self.__createSoup() |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
56 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
57 | alreadyDone = list(self.TagsToIgnore) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
58 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
59 | js = "<script>{0}".format(os.linesep) if scriptTags else "" |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
60 | js += "// {0} by {1}{2}".format( |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
61 | datetime.datetime.now().isoformat().split(".")[0], |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
62 | getpass.getuser(), |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
63 | os.linesep |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
64 | ) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
65 | js += "$(document).ready(function(){" + os.linesep |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
66 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
67 | # step 1: IDs |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
68 | js += "/*{0}*/{1}".format( |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
69 | "-" * 75, |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
70 | os.linesep |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
71 | ) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
72 | for id_ in self.__getIds(): |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
73 | if id_ not in alreadyDone: |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
74 | js += "{0}// {1}{2}".format( |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
75 | indentation, |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
76 | "#".join(id_).lower(), |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
77 | os.linesep |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
78 | ) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
79 | js += self.JsTemplate8.format( |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
80 | indentation, |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
81 | "var ", |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
82 | re.sub("[^a-z0-9]", "", |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
83 | id_[1].lower() if len(id_[1]) < 11 else |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
84 | re.sub("[aeiou]", "", id_[1].lower())), |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
85 | " = ", |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
86 | '$("#{0}").length'.format(id_[1]), |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
87 | ";", |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
88 | os.linesep, |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
89 | os.linesep |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
90 | ) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
91 | alreadyDone.append(id_) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
92 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
93 | # step 2: classes |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
94 | js += "/*{0}*/{1}".format( |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
95 | "-" * 75, |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
96 | os.linesep |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
97 | ) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
98 | for class_ in self.__getClasses(): |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
99 | if class_ not in alreadyDone: |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
100 | js += "{0}// {1}{2}".format( |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
101 | indentation, |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
102 | ".".join(class_).lower(), |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
103 | os.linesep |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
104 | ) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
105 | js += self.JsTemplate8.format( |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
106 | indentation, |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
107 | "var ", |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
108 | re.sub("[^a-z0-9]", "", |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
109 | class_[1].lower() if len(class_[1]) < 11 else |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
110 | re.sub("[aeiou]", "", class_[1].lower())), |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
111 | " = ", |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
112 | '$(".{0}").length'.format(class_[1]), |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
113 | ";", |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
114 | os.linesep, |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
115 | os.linesep |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
116 | ) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
117 | alreadyDone.append(class_) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
118 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
119 | js += "})" |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
120 | js += "{0}</script>".format(os.linesep) if scriptTags else "" |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
121 | else: |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
122 | js = "" |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
123 | return js.strip() |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
124 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
125 | def __createSoup(self): |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
126 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
127 | Private method to get a BeaitifulSoup object with our HTML text. |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
128 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
129 | from bs4 import BeautifulSoup |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
130 | self.__soup = BeautifulSoup(BeautifulSoup(self.__html).prettify()) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
131 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
132 | def __getClasses(self): |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
133 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
134 | Private method to extract all classes of the HTML text. |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
135 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
136 | @return list of tuples containing the tag name and its classes |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
137 | as a blank separated string (list of tuples of two strings) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
138 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
139 | classes = [(t.name, " ".join(t["class"])) for t in |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
140 | self.__soup.find_all(True, {"class": True})] |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
141 | return sorted(list(set(classes))) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
142 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
143 | def __getIds(self): |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
144 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
145 | Private method to extract all IDs of the HTML text. |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
146 | |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
147 | @return list of tuples containing the tag name and its ID |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
148 | (list of tuples of two strings) |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
149 | """ |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
150 | ids = [(t.name, t["id"]) for t in |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
151 | self.__soup.find_all(True, {"id": True})] |
e478a359e1fb
Added the HTML5 to JavaScript converter.
Detlev Offenbach <detlev@die-offenbachs.de>
parents:
diff
changeset
|
152 | return sorted(list(set(ids))) |