SMOTE

Node / Manipulator

SMOTE

This node oversamples the input data (i.e. adds artificial rows) to enrich the training data. The applied technique is called SMOTE (Synthetic Minority Over-sampling Technique) by Chawla et al.

Some supervised learning algorithms (such as decision trees and neural nets) require an equal class distribution to generalize well, i.e. to get good classification performance. In case of unbalanced input data, for instance there are only few objects of the "active" but many of the "inactive" class, this node adjusts the class distribution by adding artificial rows (in the example by adding rows for the "active" class).

The algorithm works roughly as follows: It creates synthetic rows by extrapolating between a real object of a given class (in the above example "active") and one of its nearest neighbors (of the same class). It then picks a point along the line between these two objects and determines the attributes (cell values) of the new object based on this randomly chosen point.

Node details

Ports Options Views

Input ports

Type: Table
Input data
Table containing labeled data for oversampling.

Output ports

Type: Table
Oversampled data
Oversampled data (input table with appended rows).

Extension

The SMOTE node is part of this extension:

Go to item

SMOTE

Node details

Input ports

Output ports

Extension

Related workflows & nodes