--- a/ThirdParty/CharDet/docs/supported-encodings.html Thu Nov 10 18:54:02 2016 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,86 +0,0 @@ -<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> -<html lang="en"> -<head> -<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> -<title>Supported encodings [Universal Encoding Detector]</title> -<link rel="stylesheet" href="css/chardet.css" type="text/css"> -<link rev="made" href="mailto:mark@diveintomark.org"> -<meta name="generator" content="DocBook XSL Stylesheets V1.65.1"> -<meta name="keywords" content="character, set, encoding, detection, Python, XML, feed"> -<link rel="start" href="index.html" title="Documentation"> -<link rel="up" href="index.html" title="Documentation"> -<link rel="prev" href="faq.html" title="Frequently asked questions"> -<link rel="next" href="usage.html" title="Usage"> -</head> -<body id="chardet-feedparser-org" class="docs"> -<div class="z" id="intro"><div class="sectionInner"><div class="sectionInner2"> -<div class="s" id="pageHeader"> -<h1><a href="/">Universal Encoding Detector</a></h1> -<p>Character encoding auto-detection in Python. As smart as your browser. Open source.</p> -</div> -<div class="s" id="quickSummary"><ul> -<li class="li1"> -<a href="http://chardet.feedparser.org/download/">Download</a> ·</li> -<li class="li2"> -<a href="index.html">Documentation</a> ·</li> -<li class="li3"><a href="faq.html" title="Frequently Asked Questions">FAQ</a></li> -</ul></div> -</div></div></div> -<div id="main"><div id="mainInner"> -<p id="breadcrumb">You are here: <a href="index.html">Documentation</a> → <span class="thispage">Supported encodings</span></p> -<div class="section" lang="en"> -<div class="titlepage"> -<div> -<div><h2 class="title"> -<a name="encodings" class="skip" href="#encodings" title="link to this section"><img src="images/permalink.gif" alt="[link]" title="link to this section" width="8" height="9"></a> Supported encodings</h2></div> -<div><div class="abstract"> -<h3 class="title"></h3> -<p><span class="application">Universal Encoding Detector</span> currently supports over two dozen character encodings.</p> -</div></div> -</div> -<div></div> -</div> -<div class="itemizedlist"><ul> -<li> -<tt class="literal">Big5</tt>, <tt class="literal">GB2312</tt>/<tt class="literal">GB18030</tt>, <tt class="literal">EUC-TW</tt>, <tt class="literal">HZ-GB-2312</tt>, and <tt class="literal">ISO-2022-CN</tt> (Traditional and Simplified Chinese)</li> -<li> -<tt class="literal">EUC-JP</tt>, <tt class="literal">SHIFT_JIS</tt>, and <tt class="literal">ISO-2022-JP</tt> (Japanese)</li> -<li> -<tt class="literal">EUC-KR</tt> and <tt class="literal">ISO-2022-KR</tt> (Korean)</li> -<li> -<tt class="literal">KOI8-R</tt>, <tt class="literal">MacCyrillic</tt>, <tt class="literal">IBM855</tt>, <tt class="literal">IBM866</tt>, <tt class="literal">ISO-8859-5</tt>, and <tt class="literal">windows-1251</tt> (Russian)</li> -<li> -<tt class="literal">ISO-8859-2</tt> and <tt class="literal">windows-1250</tt> (Hungarian)</li> -<li> -<tt class="literal">ISO-8859-5</tt> and <tt class="literal">windows-1251</tt> (Bulgarian)</li> -<li><tt class="literal">windows-1252</tt></li> -<li> -<tt class="literal">ISO-8859-7</tt> and <tt class="literal">windows-1253</tt> (Greek)</li> -<li> -<tt class="literal">ISO-8859-8</tt> and <tt class="literal">windows-1255</tt> (Visual and Logical Hebrew)</li> -<li> -<tt class="literal">TIS-620</tt> (Thai)</li> -<li> -<tt class="literal">UTF-32</tt> <acronym title="Big Endian">BE</acronym>, <acronym title="Little Endian">LE</acronym>, 3412-ordered, or 2143-ordered (with a <acronym title="Byte Order Mark">BOM</acronym>)</li> -<li> -<tt class="literal">UTF-16</tt> <acronym title="Big Endian">BE</acronym> or <acronym title="Little Endian">LE</acronym> (with a <acronym title="Byte Order Mark">BOM</acronym>)</li> -<li> -<tt class="literal">UTF-8</tt> (with or without a <acronym title="Byte Order Mark">BOM</acronym>)</li> -<li><acronym>ASCII</acronym></li> -</ul></div> -<a name="id667094"></a><table class="caution" border="0" summary=""> -<tr><td rowspan="2" align="center" valign="top" width="1%"><img src="images/caution.png" alt="Caution" title="" width="24" height="24"></td></tr> -<tr><td colspan="2" align="left" valign="top" width="99%">Due to inherent similarities between certain encodings, some encodings may be detected incorrectly. In my tests, the most problematic case was Hungarian text encoded as <tt class="literal">ISO-8859-2</tt> or <tt class="literal">windows-1250</tt> (encoded as one but reported as the other). Also, Greek text encoded as <tt class="literal">ISO-8859-7</tt> was often mis-reported as <tt class="literal">ISO-8859-2</tt>. Your mileage may vary.</td></tr> -</table> -</div> -<div class="footernavigation"> -<div style="float: left">← <a class="NavigationArrow" href="faq.html">Frequently asked questions</a> -</div> -<div style="text-align: right"> -<a class="NavigationArrow" href="usage.html">Usage</a> →</div> -</div> -<hr> -<div id="footer"><p class="copyright">Copyright © 2006, 2007, 2008 Mark Pilgrim · <a href="mailto:mark@diveintomark.org">mark@diveintomark.org</a> · <a href="license.html">Terms of use</a></p></div> -</div></div> -</body> -</html>