I set up a workflow to demonstrate how this could be done
- use group by to calculate how many duplicates there are (note: KNIME should introduce a generic COUNT(*) function - I had to use a variable)
- if the count is larger then 1 it is a duplicate
- left join it back to the original data
- sort the data by ID and other variables if you want to keep one of the duplicates
- use the LAG column to identify which line is a 2nd, 3rd occurrence of a duplicate
- make a rule to keep just a single line of each ID
- alternative: just remove all duplicates
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.1.0
Legal
By using or downloading the workflow, you agree to our terms and conditions.