资 源 简 介
This project implemented lots of popular Data-Mining/Machine-Learning algorithms.All candidate algorithms must be proper to implemented on Distribution and|or Parallel computing platform, such as Hadoop.
The ultimate goal of this project is to resolve the store and compute for very large dataset, especial for high-dimension. I known it is very difficult for this topic, if you would like to join into this challenge, please mail to me: moonblue333@hotmail.com.
Thanks Wei.Dong at cs.princeton.edu for LSH.
Additional, There is a "proof of concept" software about distribution database, the attachment is ting-0.5.0.zip. More information about it please refer to: http://www.sadbit.com or sadbit333.appspot.com (Do not ask for source-code password for this package: ting-0.5.0.zip(binary is OK); but password for any other package is OK.)
The research focus in 2009:
* 1) how to prepare data input such as special normolization to fit the LSH to get better 3-rate.
* 2)