--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ThirdParty/CharDet/docs/supported-encodings.html Mon Dec 28 16:03:33 2009 +0000 @@ -0,0 +1,86 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> +<html lang="en"> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<title>Supported encodings [Universal Encoding Detector]</title> +<link rel="stylesheet" href="css/chardet.css" type="text/css"> +<link rev="made" href="mailto:mark@diveintomark.org"> +<meta name="generator" content="DocBook XSL Stylesheets V1.65.1"> +<meta name="keywords" content="character, set, encoding, detection, Python, XML, feed"> +<link rel="start" href="index.html" title="Documentation"> +<link rel="up" href="index.html" title="Documentation"> +<link rel="prev" href="faq.html" title="Frequently asked questions"> +<link rel="next" href="usage.html" title="Usage"> +</head> +<body id="chardet-feedparser-org" class="docs"> +<div class="z" id="intro"><div class="sectionInner"><div class="sectionInner2"> +<div class="s" id="pageHeader"> +<h1><a href="/">Universal Encoding Detector</a></h1> +<p>Character encoding auto-detection in Python. As smart as your browser. Open source.</p> +</div> +<div class="s" id="quickSummary"><ul> +<li class="li1"> +<a href="http://chardet.feedparser.org/download/">Download</a> ·</li> +<li class="li2"> +<a href="index.html">Documentation</a> ·</li> +<li class="li3"><a href="faq.html" title="Frequently Asked Questions">FAQ</a></li> +</ul></div> +</div></div></div> +<div id="main"><div id="mainInner"> +<p id="breadcrumb">You are here: <a href="index.html">Documentation</a> → <span class="thispage">Supported encodings</span></p> +<div class="section" lang="en"> +<div class="titlepage"> +<div> +<div><h2 class="title"> +<a name="encodings" class="skip" href="#encodings" title="link to this section"><img src="images/permalink.gif" alt="[link]" title="link to this section" width="8" height="9"></a> Supported encodings</h2></div> +<div><div class="abstract"> +<h3 class="title"></h3> +<p><span class="application">Universal Encoding Detector</span> currently supports over two dozen character encodings.</p> +</div></div> +</div> +<div></div> +</div> +<div class="itemizedlist"><ul> +<li> +<tt class="literal">Big5</tt>, <tt class="literal">GB2312</tt>/<tt class="literal">GB18030</tt>, <tt class="literal">EUC-TW</tt>, <tt class="literal">HZ-GB-2312</tt>, and <tt class="literal">ISO-2022-CN</tt> (Traditional and Simplified Chinese)</li> +<li> +<tt class="literal">EUC-JP</tt>, <tt class="literal">SHIFT_JIS</tt>, and <tt class="literal">ISO-2022-JP</tt> (Japanese)</li> +<li> +<tt class="literal">EUC-KR</tt> and <tt class="literal">ISO-2022-KR</tt> (Korean)</li> +<li> +<tt class="literal">KOI8-R</tt>, <tt class="literal">MacCyrillic</tt>, <tt class="literal">IBM855</tt>, <tt class="literal">IBM866</tt>, <tt class="literal">ISO-8859-5</tt>, and <tt class="literal">windows-1251</tt> (Russian)</li> +<li> +<tt class="literal">ISO-8859-2</tt> and <tt class="literal">windows-1250</tt> (Hungarian)</li> +<li> +<tt class="literal">ISO-8859-5</tt> and <tt class="literal">windows-1251</tt> (Bulgarian)</li> +<li><tt class="literal">windows-1252</tt></li> +<li> +<tt class="literal">ISO-8859-7</tt> and <tt class="literal">windows-1253</tt> (Greek)</li> +<li> +<tt class="literal">ISO-8859-8</tt> and <tt class="literal">windows-1255</tt> (Visual and Logical Hebrew)</li> +<li> +<tt class="literal">TIS-620</tt> (Thai)</li> +<li> +<tt class="literal">UTF-32</tt> <acronym title="Big Endian">BE</acronym>, <acronym title="Little Endian">LE</acronym>, 3412-ordered, or 2143-ordered (with a <acronym title="Byte Order Mark">BOM</acronym>)</li> +<li> +<tt class="literal">UTF-16</tt> <acronym title="Big Endian">BE</acronym> or <acronym title="Little Endian">LE</acronym> (with a <acronym title="Byte Order Mark">BOM</acronym>)</li> +<li> +<tt class="literal">UTF-8</tt> (with or without a <acronym title="Byte Order Mark">BOM</acronym>)</li> +<li><acronym>ASCII</acronym></li> +</ul></div> +<a name="id667094"></a><table class="caution" border="0" summary=""> +<tr><td rowspan="2" align="center" valign="top" width="1%"><img src="images/caution.png" alt="Caution" title="" width="24" height="24"></td></tr> +<tr><td colspan="2" align="left" valign="top" width="99%">Due to inherent similarities between certain encodings, some encodings may be detected incorrectly. In my tests, the most problematic case was Hungarian text encoded as <tt class="literal">ISO-8859-2</tt> or <tt class="literal">windows-1250</tt> (encoded as one but reported as the other). Also, Greek text encoded as <tt class="literal">ISO-8859-7</tt> was often mis-reported as <tt class="literal">ISO-8859-2</tt>. Your mileage may vary.</td></tr> +</table> +</div> +<div class="footernavigation"> +<div style="float: left">← <a class="NavigationArrow" href="faq.html">Frequently asked questions</a> +</div> +<div style="text-align: right"> +<a class="NavigationArrow" href="usage.html">Usage</a> →</div> +</div> +<hr> +<div id="footer"><p class="copyright">Copyright © 2006, 2007, 2008 Mark Pilgrim · <a href="mailto:mark@diveintomark.org">mark@diveintomark.org</a> · <a href="license.html">Terms of use</a></p></div> +</div></div> +</body> +</html>