This workflow trains a classification model using a Random Forest algorithm to approve or disapprove loan requests. The dataset we used is the German Credit Data Set, provided by University of California Archive for Machine Learning and Intelligent Systems. This dataset contains loan requests and information about the applicants' age, marital status, checking account balance and employment. These pieces of information will be the input features to the model. If the loan request was approved, the target variable is 1 ( = credit-worthy applicant), if rejected 2 ( = risky applicant). The workflow implements the following steps:
1. read data
2. clean data (here only the renaming of the target column, column20, is required)
3. partition the data: 80% into training set - 20% into test set
4. trains a random forest to predict the target variable, that is 1 credit-worthy, 2 risky applicant, on training set.
5. saves trained model to be reused for deployment in production
6. runs model on test set data
7. calculates confusion matrix and other metrics to evaluate model performance
Read more on the topic Credit Scoring on the KNIME Blog: https://www.knime.com/blog/how-to-do-credit-scoring
1. read data
2. clean data (here only the renaming of the target column, column20, is required)
3. partition the data: 80% into training set - 20% into test set
4. trains a random forest to predict the target variable, that is 1 credit-worthy, 2 risky applicant, on training set.
5. saves trained model to be reused for deployment in production
6. runs model on test set data
7. calculates confusion matrix and other metrics to evaluate model performance
Read more on the topic Credit Scoring on the KNIME Blog: https://www.knime.com/blog/how-to-do-credit-scoring