address deduplication, string similarity and fingerprinting (a collection)
A few links and ressources I collected about address deduplication and string similarity and fingerprinting
A meta collection of KNIME ressources for address deduplication or ‘fingerprinting’
https://forum.knime.com/t/namensabgleich/19232/2?u=mlauber71
---------
Mr. Wiswedel is the man if it comes to address dedupe ...
https://forum.knime.com/u/wiswedel/summary
https://hub.knime.com/knime/spaces/Examples/latest/50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data
https://hub.knime.com/knime/spaces/Examples/latest/02_ETL_Data_Manipulation/05_Indexing_Searching/03_Example_for_Fuzzy_Address_Matching
https://forum.knime.com/t/approach-fuzzy-match-or-supervised-learning/10900
Fingerprinting for addresses
https://forum.knime.com/t/rule-based-filter-question/13419/7?u=mlauber71
Simple Fuzzy Match Example with Levenshtein distance (scottf)
https://forum.knime.com/t/getting-started-with-ml/26531/3?u=mlauber71
https://hub.knime.com/scottf/spaces/Public/latest/ForumWorkflows/2020/09/Simple%20Fuzzy%20Match%20Example
---------
Compare strings by their similarity
https://forum.knime.com/t/comparing-strings/12939/8?u=mlauber71
You have to install Palladian to do that
https://nodepit.com/product/palladian
(is a special installation)
You need this repository
https://download.nodepit.com/palladian/4.2
---------
You can group adresses (and names) by their similarity without a 'ground truth'
https://forum.knime.com/t/how-can-i-define-and-list-the-duplication-in-an-adress-data-set-sucessfully-with-using-string-distances-node-and-similarity-search-node/42568/12?u=mlauber71
https://kni.me/w/a5sHElCCuSKV7j2Q
Fuzzy Address Matching
https://kni.me/w/sZfJYtD2BpTGNWnW
Address Deduplication
https://kni.me/w/QiS--QnukXBeL3mZ
-----------------------------------------------------------------
Additional Python ressources - not yet transfered into a KNIME workflow
Super Fast String Matching in Python
https://bergvca.github.io/2017/10/14/super-fast-string-matching.html
Python - Adress matching I
https://github.com/dedupeio/address-matching
Python - Adress matching II
https://github.com/RobinL/AddressMatcher
libpostal: international street address NLP
https://github.com/openvenues/libpostal
https://datascience.stackexchange.com/questions/10810/how-to-do-postal-addresses-fuzzy-matching
Fuzzy String Matching in Python
https://marcobonzanini.com/2015/02/25/fuzzy-string-matching-in-python/
Workflow
address deduplication, string similarity and fingerprinting (a collection)
External resources
- String Deduplication without Ground Truth - KNIME Forum (75366)
- workflow: Address Deduplication
- workflow: Fuzzy Address Matching
- workflow: Match similar addresses from one list together into similar groups
- You can group adresses (and names) by their similarity without a 'ground truth'
- Simple Fuzzy Match Example with Levenshtein distance (scottf)
- Fuzzy String Matching in Python
- libpostal: international street address NLP
- Python - Adress matching II
- Python - Adress matching I
- Super Fast String Matching in Python
- Palladian is a Java-based toolkit which provides functionality to perform typical Internet Information Retrieval tasks
- Compare strings by their similarity
- Fingerprinting for addresses
- Mr. Wiswedel is the man if it comes to address dedupe ...
- A meta collection of KNIME ressources for address deduplication or ‘fingerprinting’
Used extensions & nodes
All required extensions are part of the default installation of KNIME Analytics Platform version 4.7.8
No known nodes available
Legal
By using or downloading the workflow, you agree to our terms and conditions.