Feature Selection Loop Start (2:2)
This node is the start of the feature selection loop. The feature selection loop allows you to select, from all the features in the input data set, the subset of features that is best for model construction. With this node you determine (i) which features/columns are to be held fixed in the selection process. These constant or "static" features/columns are included in each loop iteration and are exempt from elimination; (ii) which selection strategy is to be used on the other (variable) features/columns and its settings; and (iii) the specific settings of the selected strategy. This node has two in and out ports. The respective first port is intended for training data and the second port for test data. The same filter is applied to both tables and they will therefore always contain the same columns.
The following feature selection strategies are available:
- Forward Feature Selection is an iterative approach. It starts with having no feature selected. In each iteration, the feature that improves the model the most is added to the feature set.
- Backward Feature Elimination is an iterative approach. It starts with having all features selected. In each iteration, the feature that has on its removal the least impact on the models performance is removed.
- Genetic Algorithm is a stochastic approach that bases its optimization on the mechanics of biological evolution and genetics. Similar to natural selection, different solutions (individuals) are carried and mutated from generation to generation based on their performance (fitness). This approach converges into a local optimum and enabling early stopping might be recommended. See, e.g., this article for more insights.
- Random is a simple approach that selects feature combinations randomly. There is no converging and by chance (one of) the best feature combination will be drawn in an early iteration, so that early stopping might be recommended.
- Type: Data A data table containing all features and static columns needed for the feature selection. (Trainingdata)
- Type: Data A data table containing all features and static columns needed for the feature selection. (Testdata)
- Type: Data The input table with some columns filtered out. (Training data)
- Type: Data The input table with some columns filtered out. (Test data)