This is a typical predictive analytics problem with both categorical and numeric variables. It is a regression problem. We use RandomForest. At data exploration stage, we explore if it would benefit performance if one or more of numeric columns are discretized. Also, we try to transform the skewed target vaiable to make it symmetrical using function: sqrt. (After prediction stage, we square the predicted output).
To minimize uncertainity in the results, we loop over the partitioning, missing value imputation, modeling, predicting and scoring multiple times. We then calculate confidence interval of mean RMSE.
Workflow
Predict BigMart Sales using randomForest--II
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.0.2
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Legal
By using or downloading the workflow, you agree to our terms and conditions.