Autofeat Generator

This component uses 'autofeat' python library to generate new features. The use of these features is directed towards building linear models. The performance of the linear models is comparable to non-linear models. These linear models have an additional benefit of models being transparent and easy to explain and interpret. Inputs to the component are train and test DataFrames. Missing values must be filled in prior to data input. The component builds model using train data and the built model is then applied on test data. The model itself is saved to a file (in pickle format) on disk by name of 'autofeat_model.pkl'. Feature engineering can only be on numeric features. Target column should also be numeric. Feature generation takes time as feature selection process is also involved. Number of feature generation steps is an important parameter that decides the number of features. More the number of steps, more the number of features, more the possibility of overfitting. Outputs from the component are train and test data with newly created features. Another output is the autofeat model built on train data. Given the model output, you can also use the component 'Autofeat Apply' for feature generation on test data. The component uses python autofeat library along with numpy and pandas. For more about 'autofeat' library, please see this paper: https://arxiv.org/pdf/1901.07329.pdf OR github site: https://github.com/cod3licious/autofeat . The autofeat project is Copyright (c) 2016 by its authors and released under MIT License (https://github.com/cod3licious/autofeat/blob/master/LICENSE).

Component details

Ports Options Views

Input ports

Type: Table
trainData
train data: Feed here data that will be used for training the feature generator. Normalized data would be preferable. Missing values need to be filled in before feeding here. Data should also include target column.
Type: Table
testData
test data: Feed here test data. Normalized data would be preferable. Missing values need to be filled in before feeding here. Data should also include target column.

Output ports

Type: Table
TrainEngineeredFeatures
Output train data with generated features and features already present in the dataframe.
Type: Python
Trained Model
Outputs autofeat model.
Type: Table
TestEngineeredFeatures
Output test data with generated features and features already present in the dataframe.

Legal

By using or downloading the component, you agree to our terms and conditions.

Component details

Input ports

Output ports

KNIME Base nodes

KNIME Python Integration

KNIME Quick Forms

Legal