Sometimes you may find that certain rows in your tables are duplicated one or more times. This may be due to many reasons, including bad data, combining tables through joins and concatenations, or some other analytic process.
Regardless of the reason, it is often the case that you do not want duplicate records. That is where the Duplicate Row Filter node comes in: it can automatically remove or flag rows whose values are duplicates of another row's.
The Duplicate Row Filter's configuration allows you to select which columns to check for duplicates. By default, all columns are selected, but you may include any subset of columns as per your specific needs.
On the Advanced tab, you can choose whether to remove or just flag duplicate rows. Further, there are options on which rows are removed: the first, last, minimum of, or maximum of. Finally, you can elect to retain the current row order, although this may lead to slower processing.
Workflow
Duplicate Row Filter
Used extensions & nodes
Created with KNIME Analytics Platform version 4.7.2
- Go to item
- Go to item
Legal
By using or downloading the workflow, you agree to our terms and conditions.