This workflow is based on the adult.csv data set. Try it out to:
1. Remove duplicates
- keep the first or last appearance of the duplicates
- keep the row of duplicates that has a maximum or minimum value regarding a specific feature
2. Flag duplicates
- add a column that flags rows as unique, duplicate or chosen
- add a column that displays the RowID of the (representative) chosen row for each duplicate
- add both columns for the two flag types that were mentioned before
Workflow
Duplicate Row Filter
Used extensions & nodes
Created with KNIME Analytics Platform version 4.1.0
Legal
By using or downloading the workflow, you agree to our terms and conditions.