Machine Learning Meta Collection (with KNIME)
This meta collection is about machine learning. It contains links to some examples demonstrating several types of machine learning mosttly with KNIME and also some links how to learn machine learning (again mostly witth KNIME). It is not a complete collection of ML methods and algorithms and far from answering all questions or covering all topics - more like a quick practical overview of some aspects; and always with a focus on Mnimal Viable Examples you could try at home. Please note these examples do not substitute for a deeper understanding of your business problems and the various -statistical- implications to consider when using such models - in other words: terms and conditions *do* apply.
--------------- Learning Machine Learning (with KNIME) ---------
How to learn machine learning with KNIME
https://forum.knime.com/t/knime-based-machine-learning-course/21876/2?u=mlauber71
[L1-DS] - KNIME Analytics Platform for Data Scientists: Basics
Lesson 4. Machine Learning & Data Export
https://www.knime.com/self-paced-course/l1-ds-knime-analytics-platform-for-data-scientists-basics/lesson4?u=mlauber71
-----------------------------------------------------------------
Links to types of prediction models
https://forum.knime.com/t/how-to-find-the-optimal-process-parameter-based-on-quality-defects/20846/6?u=mlauber71
-----
1) Models for binary classsifications - 0/1 or Yes/No Targets
https://forum.knime.com/t/looking-for-options-to-evaluate-a-decision-tree/11384/2?u=mlauber71
Understand metrics like AUC and Gini (and use H2O.ai)
https://forum.knime.com/t/random-forest-model-not-working/12738/3?u=mlauber71
https://forum.knime.com/t/help-choosing-analytics-algorithm/11404/3?u=mlauber71
11 Important Model Evaluation Metrics for Machine Learning Everyone should know
https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/
-----
2) Model for Multiclass Targets (and explanation of Log Loss statistics)
https://forum.knime.com/t/any-advice-to-improve-the-performance-of-a-classification-model/12801/10?u=mlauber71
https://forum.knime.com/t/metrics-in-multiclass-classification/11193/3?u=mlauber71
Score Documents with multiple Classes?
https://forum.knime.com/t/urgent-what-is-wrong-with-my-decision-tree-predictor-for-new-data/13292/10?u=mlauber71
-----
3) Regression models (numeric Target)
https://forum.knime.com/t/predictive-analytics-for-sales/12858/3?u=mlauber71
https://forum.knime.com/t/forecasting-sales-per-customer-for-the-next-360-days/13221/4?u=mlauber71
https://forum.knime.com/t/evaluate-a-linear-regression-model/13305/2?u=mlauber71
https://forum.knime.com/t/how-to-identify-the-top-100-features-selected-from-mlp-model/11371/2?u=mlauber71
Regression collection (Time Series)
https://forum.knime.com/t/prediction-based-on-multi-variables/20184/5?u=mlauber71
predict how many future visitors a restaurant will receive (with H2O.ai)
https://www.knime.com/blog/solving-a-kaggle-challenge-using-the-combined-power-of-knime-analytics-platform-h2o?u=mlauber71
------------------------------------------------------------
PMML Models with numeric scores
https://forum.knime.com/t/export-pmml-that-outputs-class-probabilities/13244/2?u=mlauber71
-----------------------------------------------------------------
Data preparation steps
[preparation] Techniques for Dimensionality Reduction
https://hub.knime.com/knime/spaces/Examples/latest/04_Analytics/01_Preprocessing/02_Techniques_for_Dimensionality_Reduction/02_Techniques_for_Dimensionality_Reduction~7PBv1kGifxCng2qo
[preparation] Three New Techniques for Data Dimensionality Reduction in Machine Learning
https://www.knime.com/blog/three-new-techniques-for-data-dimensionality-reduction-in-machine-learning
[preparation] use R's vtreat to automatically prepare data fo classification and regression tasks
https://forum.knime.com/t/is-artificial-intelligence-used-for-data-cleansing-techniques-used-by-knime/36209/6?u=mlauber71
[preparation] Spark Label Encoding, remove highly correlated variables - prepare the data in local Big Data environment
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_h2o_automl_spark/s_401_spark_label_encoder~mF4g6HTMX7J4m27Q
prepare the preparation of data in a big data environment
- label encode string variables
- transform numbers into Double format (Spark ML likes that)
- remove highly correlated data
- remove NaN variables
- remove continous variables
- optional: normalize the data
-----------------------------------------------------------------
How to handle missing values
Basic missing value handling
https://hub.knime.com/knime/spaces/Examples/latest/02_ETL_Data_Manipulation/04_Transformation/01_Handling_Missing_Values
some more advanced approaches to missing values
https://hub.knime.com/knime/spaces/Education/latest/Courses/L4-ML%20Introduction%20to%20Machine%20Learning%20Algorithms/Session_4/02_Solutions/02_Missing_Value_Handling_solution
Multipe Imputation for Missing Values
https://hub.knime.com/kathrin/spaces/Missing%20Value%20Imputation/latest/Mulitple%20Imputation%20for%20Missing%20Values
Comparing Missing Value Handling Methods
https://hub.knime.com/kathrin/spaces/Missing%20Value%20Imputation/latest/Comparing%20Missing%20Value%20Handling%20Methods
Employ R's Amelia package to replace missing values
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_r_amelia/m_001_missing_values_amelia
-----------------------------------------------------------------
about unbalanced Targets
https://forum.knime.com/t/xgboost-predictor/23960/5?u=mlauber71
about unbalanced data and evaluation metrics (AUCPR)
https://forum.knime.com/t/problem-with-unbalanced-data-with-examples-attached/26227/4?u=mlauber71
another thread about how to handle imbalanced data
https://forum.knime.com/t/knime-fraud-detection-autoencoder/28859/17?u=mlauber71
--------------- KNIME and H2O.ai ----------
H2O.ai models and KNIME in general
https://www.knime.com/nodeguide/analytics/h2o-machine-learning?u=mlauber71
simple example how to use H2O.ai models in a Big Data environment
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_h2o_sparkling_water?u=mlauber71
H2O.ai AutoML in KNIME for classification problems
https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923?u=mlauber71
H2O.ai AutoML in KNIME for regression problems
https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924?u=mlauber71
„Sparkling Predictions and Encoded Labels – Developing and Deploying Predictive Models on a Big Data Cluster with KNIME, Spark and H2O.ai“
(talk in German, slides in English)
https://www.youtube.com/watch?v=k8MsxzwEVrk&t=4335s
--------------- KNIME and Python ----------
use Python and KNIME to make a random forest (quick basic example)
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_python_iris?u=mlauber71
Python Installation (the very short story)
https://forum.knime.com/t/problem-with-setting-a-python-deep-learning-environment/19477/2?u=mlauber71
https://forum.knime.com/t/installing-a-new-library-in-python/25365/4?u=mlauber71
Python KNIME official installation
https://docs.knime.com/2020-07/python_installation_guide/index.html?u=mlauber71
Python and Deep Learning
https://docs.knime.com/latest/deep_learning_installation_guide/index.html?u=mlauber71
Python and Anaconda versions / Python and Keras
https://forum.knime.com/t/python-extension-not-recognizing-anaconda-environment-in-knime-3-7/12978/3?u=mlauber71
https://forum.knime.com/t/python-extension-not-recognizing-anaconda-environment-in-knime-3-7/12978/9?u=mlauber71
--------------- Special ----------
Rule Induction with Weka Rule Nodes and Yacaree Associator
https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_rule_induction_weka_hotspot_and_yacaree_rules?u=mlauber71
Not strictly a KNIME thing but very helpful books and blogs about ML and Python
https://machinelearningmastery.com/
Clustering Algorithms (small collection in KNIME)
https://forum.knime.com/t/ml-techniques-which-one-can-i-use-to-predict-sales-in-a-particular-country/28783/5?u=mlauber71
External resources
- [preparation] use R's vtreat to automatically prepare data fo classification and regression tasks
- [preparation] Spark Label Encoding, remove highly correlated variables - prepare the data in local Big Data environment
- [preparation] Three New Techniques for Data Dimensionality Reduction in Machine Learning
- [preparation] Techniques for Dimensionality Reduction
- [H2O.ai] „Sparkling Predictions and Encoded Labels – Developing and Deploying Predictive Models on a Big Data Cluster with KNIME, Spark and H2O.ai“ (talk in German, slides in English)
- [H2O.ai] AutoML in KNIME for regression problems
- [H2O.ai] AutoML in KNIME for classification problems
- [H2O.ai] simple example how to use H2O.ai models in a Big Data environment
- [H2O.ai] models and KNIME in general
- [unbalanced] another thread about how to handle imbalanced data
- [unbalanced] about unbalanced data and evaluation metrics (AUCPR)
- [unbalanced] about unbalanced Targets
- [missings] Employ R's Amelia package to replace missing values
- [missings] Comparing Missing Value Handling Methods
- [missings] Multipe Imputation for Missing Values
- [missings] some more advanced approaches to missing values
- [missings] Basic missing value handling
- PMML Models with numeric scores
- [regression] predict how many future visitors a restaurant will receive (with H2O.ai)
- [regression] Regression collection (Time Series)
- [regression] Regression models (numeric Target) (4)
- [regression] Regression models (numeric Target) (3)
- [regression] Regression models (numeric Target) (2)
- [regression] Regression models (numeric Target) (1)
- [multiclass] Score Documents with multiple Classes?
- [multiclass] Model for Multiclass Targets (and explanation of Log Loss statistics) (2)
- [multiclass] Model for Multiclass Targets (and explanation of Log Loss statistics) (1)
- [binary] 11 Important Model Evaluation Metrics for Machine Learning Everyone should know
- [binary] Understand metrics like AUC and Gini (and use H2O.ai) (2)
- [binary] Understand metrics like AUC and Gini (and use H2O.ai) (1)
- [binary] Models for binary classsifications - 0/1 or Yes/No Targets
- Links to types of prediction models
- [L1-DS] - KNIME Analytics Platform for Data Scientists: Basics - Lesson 4. Machine Learning & Data Export
- How to learn machine learning with KNIME
- Rule Induction with Weka Rule Nodes and Yacaree Associator
Used extensions & nodes
All required extensions are part of the default installation of KNIME Analytics Platform version 4.7.8
No known nodes available
Legal
By using or downloading the workflow, you agree to our terms and conditions.