address deduplication, string similarity and fingerprinting (a collection)
A few links and ressources I collected about address deduplication and string similarity and fingerprinting
A meta collection of KNIME ressources for address deduplication or ‘fingerprinting’
https://forum.knime.com/t/namensabgleich/19232/2?u=mlauber71
---------
Mr. Wiswedel is the man if it comes to address dedupe ...
https://forum.knime.com/u/wiswedel/summary
https://hub.knime.com/knime/spaces/Examples/latest/50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data
https://hub.knime.com/knime/spaces/Examples/latest/02_ETL_Data_Manipulation/05_Indexing_Searching/03_Example_for_Fuzzy_Address_Matching
https://forum.knime.com/t/approach-fuzzy-match-or-supervised-learning/10900
Fingerprinting for addresses
https://forum.knime.com/t/rule-based-filter-question/13419/7?u=mlauber71
Simple Fuzzy Match Example with Levenshtein distance (scottf)
https://forum.knime.com/t/getting-started-with-ml/26531/3?u=mlauber71
https://hub.knime.com/scottf/spaces/Public/latest/ForumWorkflows/2020/09/Simple%20Fuzzy%20Match%20Example
---------
Compare strings by their similarity
https://forum.knime.com/t/comparing-strings/12939/8?u=mlauber71
You have to install Palladian to do that
https://nodepit.com/product/palladian
(is a special installation)
You need this repository
https://download.nodepit.com/palladian/4.2
---------
You can group adresses (and names) by their similarity without a 'ground truth'
https://forum.knime.com/t/how-can-i-define-and-list-the-duplication-in-an-adress-data-set-sucessfully-with-using-string-distances-node-and-similarity-search-node/42568/12?u=mlauber71
https://kni.me/w/a5sHElCCuSKV7j2Q
Fuzzy Address Matching
https://kni.me/w/sZfJYtD2BpTGNWnW
Address Deduplication
https://kni.me/w/QiS--QnukXBeL3mZ
-----------------------------------------------------------------
Additional Python ressources - not yet transfered into a KNIME workflow
Super Fast String Matching in Python
https://bergvca.github.io/2017/10/14/super-fast-string-matching.html
Python - Adress matching I
https://github.com/dedupeio/address-matching
Python - Adress matching II
https://github.com/RobinL/AddressMatcher
libpostal: international street address NLP
https://github.com/openvenues/libpostal
https://datascience.stackexchange.com/questions/10810/how-to-do-postal-addresses-fuzzy-matching
Fuzzy String Matching in Python
https://marcobonzanini.com/2015/02/25/fuzzy-string-matching-in-python/
Workflow
address deduplication, string similarity and fingerprinting (a collection)
External resources
- A meta collection of KNIME ressources for address deduplication or ‘fingerprinting’
- Mr. Wiswedel is the man if it comes to address dedupe ...
- Fingerprinting for addresses
- Compare strings by their similarity
- Palladian is a Java-based toolkit which provides functionality to perform typical Internet Information Retrieval tasks
- Super Fast String Matching in Python
- Python - Adress matching I
- Python - Adress matching II
- libpostal: international street address NLP
- Fuzzy String Matching in Python
- Simple Fuzzy Match Example with Levenshtein distance (scottf)
- You can group adresses (and names) by their similarity without a 'ground truth'
- workflow: Match similar addresses from one list together into similar groups
- workflow: Fuzzy Address Matching
- workflow: Address Deduplication
- String Deduplication without Ground Truth - KNIME Forum (75366)
Used extensions & nodes
All required extensions are part of the default installation of KNIME
Analytics Platform version 4.7.8
No known nodes available
Loading deployments
Loading ad hoc jobs
Legal
By using or downloading the workflow, you agree to our terms and conditions.