资 源 简 介
Introduction
SketchSort(1) is a software for all pairs similarity search. It takes as an input data points and outputs approximate neighbor pairs within a distance (1.0 - cosine similarity). First, the input data points are mapped to binary strings (sketches) by locality sensitive hashing, and then neighbor pairs of strings within a Hamming distance are enumerated by the multiple sorting method (2). Finally, the cosine distances for such neighbor pairs are calculated. If the cosine distance for a neighbor pair is no more than a user-specified threshold , the neighbor pair is output. One might worry about missed nearest neighbor pairs by SketchSort. A theoretical lower bound of the expectation of missing edge ratio is derived. It enables us to set parameters so as to limit the empirical missing edge ratio as small as possible.
Quick Start
To compile SketchSort , please type the followings:
tar -xjvf sketchsort