资 源 简 介
Language detector - as the name suggests is a program that is capable of detecting the language for any given description. The system will have a specific pattern for each language, which it uses to identify the language of the given description based on the closest matching pattern. In data analysis operations, we may need to restrict to a limited set of languages getting into the system - where the Language detectors comes in handy.
The existing language detector available for python is "oice.langdet" - it lacks several features that a STANDARD language detector is expected to have. Few of the features are,
(i) Ability to detect multiple languages (currently only 3 languages supported)
(ii) It does a "Bi-gram" analysis on the input data. Which can lead to wrong predictions in some cases? (Lesser accuracy)
(iii) It is available only for "python" / usable only by python-programs. Shouldn"t it be usable