资 源 简 介
This version has now been tested and can be used.
K-means is a simple clustering procedure, this project provides it as a dedicated package.
It has been extracted from Apache commons math3 (from the snapshot that is not yet released) and extended by important features, such as
(1) double values in data vectors;
(2) simple support for missing values in data
(3) support for defining the activation state of single fields, which allows adding an index column,
or a target variable (dependent, but inactive) to the data vector.
(4) normalization of the data on-the-fly
These issues are the minimum features necessary to render a clustering useful / usable in "daily life". Especially the last issue of normalization is extremely important. Without normalization, results are often simply garbage. The package contains an example that demonstrates this issue.
About missing values: some packages provide the possibility to "impu