资 源 简 介
Duke has moved to Github
Duke is a fast and flexible deduplication (or entity resolution, or record linkage) engine written in Java on top of Lucene. The latest version is 1.1 (see ReleaseNotes).
Features
High performance.
Highly configurable.
Support for CSV, JDBC, SPARQL, and NTriples DataSources.
Many built-in comparators.
Plug in your own data sources, comparators, and cleaners.
GeneticAlgorithm for automatically tuning configurations.
Command-line client for getting started.
API for embedding into any kind of application.
Support for batch processing and continuous processing.
Can maintain database of links found via JNDI/JDBC.
Can run in multiple threads.
Duke has moved to