This component uses 'autofeat' python library to generate new features. The use of these features is directed towards building linear models. The performance of the linear models is comparable to non-linear models. These linear models have an additional benefit of models being transparent and easy to explain and interpret.
Inputs to the component are train and test DataFrames. Missing values must be filled in prior to data input. The component builds model using train data and the built model is then applied on test data. The model itself is saved to a file (in pickle format) on disk by name of 'autofeat_model.pkl'. Feature engineering can only be on numeric features. Target column should also be numeric.
Feature generation takes time as feature selection process is also involved. Number of feature generation steps is an important parameter that decides the number of features. More the number of steps, more the number of features, more the possibility of overfitting. Outputs from the component are train and test data with newly created features. Another output is the autofeat model built on train data.
Given the model output, you can also use the component 'Autofeat Apply' for feature generation on test data.
The component uses python autofeat library along with numpy and pandas. For more about 'autofeat' library, please see this paper: https://arxiv.org/pdf/1901.07329.pdf OR github site: https://github.com/cod3licious/autofeat .
The autofeat project is Copyright (c) 2016 by its authors and released under MIT License (https://github.com/cod3licious/autofeat/blob/master/LICENSE).
- Type: TabletrainDatatrain data: Feed here data that will be used for training the feature generator. Normalized data would be preferable. Missing values need to be filled in before feeding here. Data should also include target column.
- Type: TabletestDatatest data: Feed here test data. Normalized data would be preferable. Missing values need to be filled in before feeding here. Data should also include target column.