Duplicate Row Filter

Workflow

Draft Latest edits on

Sometimes you may find that certain rows in your tables are duplicated one or more times. This may be due to many reasons, including bad data, combining tables through joins and concatenations, or some other analytic process. Regardless of the reason, it is often the case that you do not want duplicate records. That is where the Duplicate Row Filter node comes in: it can automatically remove or flag rows whose values are duplicates of another row's. The Duplicate Row Filter's configuration allows you to select which columns to check for duplicates. By default, all columns are selected, but you may include any subset of columns as per your specific needs. On the Advanced tab, you can choose whether to remove or just flag duplicate rows. Further, there are options on which rows are removed: the first, last, minimum of, or maximum of. Finally, you can elect to retain the current row order, although this may lead to slower processing.

Loading deploymentsLoading manual runs

Legal

By using or downloading the workflow, you agree to our terms and conditions.

Duplicate Row Filter

Used extensions & nodes

KNIME Base nodesTrusted extension

Legal

KNIME Base nodes