Address Deduplication

Workflow preview
The workflow shows the power of the new distance measurement framework - a high prediction correctness of possible matches is achieved with a minimum number of nodes and without any preprocessing by just aggregating some distances on different attributes. The chosen data set is the "Restaurant data set" from comprising 864 restaurant records and 112 duplicates. Each record contains a name, an address, a city, a type and finally a class attribute. Records with an identical value in the class attribute point to the same real-word entity or restaurant in our case.

External Resources

hosted by

Download workflow

By downloading the workflow, you agree to our terms and conditions.

License CC-BY-4.0


Discussions are currently not available, please try again later.