资 源 简 介
PRETO: A High-performance Text Mining Tool for Preprocessing Turkish Texts
Text documents are usually unstructured and written in natural language. To apply
conventional data mining techniques on text documents, a preprocessing operation is indispensable. Here, we introduce PRETO, a cross-platform, powerful and scalable preprocessing tool developed specifically for preprocessing Turkish texts, with a wide range of preprocessing options like stemming, stopword filtering, statistical term filtering, and n-gram generation.
Source code in Java is available via Subversion at the Source page:
http://code.google.com/p/preto/source/checkout
PRETO is developed using NetBeans IDE. So we recommend you use it.
You can download the executable version from the downloads page: