资 源 简 介
Ude is a C# port of Mozilla Universal Charset Detector.
The article "A composite approach to language/encoding detection" describes the charsets detection algorithms implemented by the library.
Ude can recognize the following charsets:
UTF-8
UTF-16 (BE and LE)
UTF-32 (BE and LE)
windows-1252 (mostly equivalent to iso8859-1)
windows-1251 and ISO-8859-5 (cyrillic)
windows-1253 and ISO-8859-7 (greek)
windows-1255 (logical hebrew. Includes ISO-8859-8-I and most of x-mac-hebrew)
ISO-8859-8 (visual hebrew)
Big-5
gb18030 (superset of gb2312)
HZ-GB-2312
Shift-JIS
EUC-KR, EUC-JP, EUC-TW
ISO-2022-JP, ISO-2022-KR, ISO-2022-CN
KOI8-R
x-mac-cyrillic