This Component creates an interactive view to filter and select columns for your model based on the relevance of the columns to the ground truth specified. It also captures the user specified columns by means of integrated deployment.
SET COLUMN RELEVANCE FILTER
Column Relevance is an overall metric summarizing the metrics belows. Use the slider to select the input features based on their Overall Column Relevance.
The additional metrics calculated automatically and used to determine Overall Column Relevance include:
- ID/Noise Test: measures how likely the column is a representation used to identify each row in your table. Row identifiers are uninformative for your model and should be removed.
- Constant Value: Test measures how often the column contains the exact same value. Columns with just a constant value also carry no information. You should avoid using them.
- Missing Value Test: measures the percentage of missing values in a column over the entire dataset. You should remove features with a percentage of missing values too high.
By using the slider, columns can be excluded from model training based on their column relevance. Furthermore you can use the linear correlation between each column and the column to predict to refine your input set.
- Correlation with Target: measures the linear correlation with the column the model will predict: Income. It is important to keep in mind if a feature is highly or poorly correlated. If you have high correlation (close to + or - 100%) this will help the model to achieve a good performance, unless the column has too many unique values (e.g. an high ID/Noise Test). If instead you have low correlation (close to 0%), you might exclude the feature in exchange for a faster training of the model. Be aware that very highly correlated columns can also be the result of the target column.
MANUALLY SELECT COLUMNS
You can use the Column Relevance Filter - but you don't have to. Alternatively, you can remove individual columns manually in the Data Explorer table in the lower part of this page. This table allows you to explore both numeric and nominal columns. Clicking on a column name will provide additional information about the data in that column, for example statistics and histogram showing their distribution. Remember that the final set of columns to be excluded will be the unique set produced by both the Column Relevance Filter and the Manually Select Columns.
- Type: TableInput DataData with input feature columns and ground truth for your model.