资 源 简 介
Compute syntactical similarity of the text. Java program that compares two files and return
- in percentage - how similar they are.
So for example: java -jar ss.jar c:/tmp/a.txt c:/tmp/b.txt
Output would be: Similarity is 89.60159%
Some texts are too similar to each other, like almost! duplicated news articles for example. The difference could be that in the middle of the text is different advertisement or just headline is slightly modified.
This simple program tries to compute how much (in percentage) are two texts similar.
Note:
This is syntactical similarity, not lexical one. It means that only structure of words and phrases is taken into account not their meaning. This project is used as part of http://www.opfine.com/ online financial news text analyser to simplify and reduce resources load.