This workflow demonstrates the power of exorbyte's Approximate String Matcher for handling noisy and inconsistent data.
We start with two tables:
Correct Names: a curated list of retailer names.
Comparison Table: a set of misspelled or variant retailer names.
Using our Approximate String Matcher node, we compare the inputs with three different algorithms:
Levenshtein Distance: captures character-level edits such as insertions, deletions, or substitutions.
Positional Matching: accounts for character order and placement, robust against shifted or swapped characters. Best for fixed-format codes (e.g., IDs like "AB-1234")
Longest Common Subsequence (LCS): identifies shared sequences of characters, tolerant of gaps and rearrangements.