address deduplication, string similarity and fingerprinting (a collection)
A few links and ressources I collected about address deduplication and string similarity and fingerprinting
A meta collection of KNIME ressources for address deduplication or ‘fingerprinting’
https://forum.knime.com/t/namensabgleich/19232/2?u=mlauber71
---------
Mr. Wiswedel is the man if it comes to address dedupe ...
https://forum.knime.com/u/wiswedel/summary
https://hub.knime.com/knime/spaces/Examples/latest/50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data
https://hub.knime.com/knime/spaces/Examples/latest/02_ETL_Data_Manipulation/05_Indexing_Searching/03_Example_for_Fuzzy_Address_Matching
https://forum.knime.com/t/approach-fuzzy-match-or-supervised-learning/10900
Fingerprinting for addresses
https://forum.knime.com/t/rule-based-filter-question/13419/7?u=mlauber71
Simple Fuzzy Match Example with Levenshtein distance (scottf)
https://forum.knime.com/t/getting-started-with-ml/26531/3?u=mlauber71
https://hub.knime.com/scottf/spaces/Public/latest/ForumWorkflows/2020/09/Simple%20Fuzzy%20Match%20Example
---------
Compare strings by their similarity
https://forum.knime.com/t/comparing-strings/12939/8?u=mlauber71
You have to install Palladian to do that
https://nodepit.com/product/palladian
(is a special installation)
You need this repository
https://download.nodepit.com/palladian/4.2
---------
You can group adresses (and names) by their similarity without a 'ground truth'
https://forum.knime.com/t/how-can-i-define-and-list-the-duplication-in-an-adress-data-set-sucessfully-with-using-string-distances-node-and-similarity-search-node/42568/12?u=mlauber71
https://kni.me/w/a5sHElCCuSKV7j2Q
Fuzzy Address Matching
https://kni.me/w/sZfJYtD2BpTGNWnW
Address Deduplication
https://kni.me/w/QiS--QnukXBeL3mZ
-----------------------------------------------------------------
Additional Python ressources - not yet transfered into a KNIME workflow
Super Fast String Matching in Python
https://bergvca.github.io/2017/10/14/super-fast-string-matching.html
Python - Adress matching I
https://github.com/dedupeio/address-matching
Python - Adress matching II
https://github.com/RobinL/AddressMatcher
libpostal: international street address NLP
https://github.com/openvenues/libpostal
https://datascience.stackexchange.com/questions/10810/how-to-do-postal-addresses-fuzzy-matching
Fuzzy String Matching in Python
https://marcobonzanini.com/2015/02/25/fuzzy-string-matching-in-python/
Workflow
address deduplication, string similarity and fingerprinting (a collection)
External resources
- String Deduplication without Ground Truth - KNIME Forum (75366)
- workflow: Address Deduplication
- workflow: Fuzzy Address Matching
- workflow: Match similar addresses from one list together into similar groups
- You can group adresses (and names) by their similarity without a 'ground truth'
- Simple Fuzzy Match Example with Levenshtein distance (scottf)
- Fuzzy String Matching in Python
- libpostal: international street address NLP
- Python - Adress matching II
- Python - Adress matching I
- Super Fast String Matching in Python
- Palladian is a Java-based toolkit which provides functionality to perform typical Internet Information Retrieval tasks
- Compare strings by their similarity
- Fingerprinting for addresses
- Mr. Wiswedel is the man if it comes to address dedupe ...
- A meta collection of KNIME ressources for address deduplication or ‘fingerprinting’
Used extensions & nodes
All required extensions are part of the default installation of KNIME Analytics Platform version 4.7.8
Legal
By using or downloading the workflow, you agree to our terms and conditions.