Address Deduplication

Workflow preview
The workflow shows the power of the new distance measurement framework - a high prediction correctness of possible matches is achieved with a minimum number of nodes and without any preprocessing by just aggregating some distances on different attributes. The chosen data set is the "Restaurant data set" from http://www.cs.utexas.edu/users/ml/riddle/data.html comprising 864 restaurant records and 112 duplicates. Each record contains a name, an address, a city, a type and finally a class attribute. Records with an identical value in the class attribute point to the same real-word entity or restaurant in our case.

External Resources

hosted by

Download workflow

By downloading the workflow, you agree to our terms and conditions.

License CC-BY-4.0

Discussion

Discussions are currently not available, please try again later.