资 源 简 介
A Java desktop application (using the J2SE 5 platform and the Swing API) for automatic classification of documents against a given training set. It has been developed, and is packaged, as a Netbeans project.
It uses the stemmers created with Snowball (http://snowball.tartarus.org, released under the BSD license) for text pre-processing, TF-IDF or the Bhattacharrya distance to rank the documents of the training set to the query document, and the K-NN algorithm to classify it.
As of now, it only supports the classification of news from the ANSA website (http://www.ansa.it - The Italian main news agency), but the program has a modular architecture, that allows it to be extended by writing plugins for scraping the content of other websites, or other types of documents (PDF, DOC, ODT, etc...).
文 件 列 表
javadocs
documentclassifier
Scrapers
class-use
JLex
ansascraper
resources
inherit.gif
index-files
index-1.html
allclasses-frame.html
allclasses-noframe.html
constant-values.html
deprecated-list.html
help-doc.html
index.html
overview-frame.html
overview-summary.html
overview-tree.html
package-list
serialized-form.html
stylesheet.css
documentclassifier
DocumentClassifierApp.html
DocumentClassifierView.html
MapDefaultPreferences.html
package-frame.html
package-summary.html
package-tree.html
package-use.html
PreferencesDialog.html
DocumentClassifierAboutBox.html
Scrapers
ANSAScraper.html
package-frame.html
package-summary.html
package-tree.html
package-use.html
Scraper.html
class-use
ANSAScraper.html
Scraper.html
BhattacharryaDistanceComparator.html
TFIDFComparator.html
Main.html
DocumentClassifierAboutBox.html
DocumentClassifierApp.html
DocumentClassifierView.html
MapDefaultPreferences.html
PreferencesDialog.html
parser.html
sym.html
Yylex.html