Experiment with:
- simple random sampling
- stratified random sampling (Partitioning node)
- undersampling (Equal Size Sampling node)
- oversampling (Bootstrap Sampling node and SMOTE node)
The workflow draws on the kaggle Stroke Prediction Dataset that represents 5110 rows with 11 clinical features such as body mass index, smoking status, age, gender, and glucose level. The task is to predict stroke (yes/no), which is a classification problem. We chose to build a Random Forest model.
Used extensions & nodes
Created with KNIME Analytics Platform version 4.4.2
Legal
By using or downloading the workflow, you agree to our terms and conditions.