资 源 简 介
This project addresses the following problem: Given a dataset of sparse vector data, find all similar vector pairs according to a similarity function such as cosine distance and a given similarity score threshold. (This problem is also known as the "similarity join.")
The package consists of a bare-bones implementation of the
"All-Pairs-Binary" algorithm described in the following paper:
R. J. Bayardo, Yiming Ma, Ramakrishnan Srikant. Scaling Up All-Pairs
Similarity Search. In Proc. of the 16th Int"l Conf. on World Wide Web,
131-140, 2007. (download from: http://www.bayardo.org/ps/www2007.pdf)
Click on the "Source" tab for instructions on downloading the source code. Makefiles are provided for both GNU g++ and Microsoft VC++ compilers.
Click on the "Downloads" tab to download a sample dataset to test your binary.
On Linux type systems with GNU g++, here"s a quick se