Fraud Detection: Distribution Method Training
In this workflow, the Distribution Method is used to check for fraud. The Distribution method for classification is particularly useful in data where majority of the data is expected to follow a certain pattern or distribution. For credit card transactions, we can use this to help determine whether there is potential fraud or not. We start with reading in the training data from a sample dataset. The table is preprocessed to convert the classifiers of "0" or "1" to either "good" or "fraud". Next, the data undergoes a Z-score normalization, which standardizes the data to a mean of zero and a standard deviation of one, making it easier to compare different scales. The z-score normalization model is exported for later use in deployment. We analyze the data to check distributions and employ filters to isolate a single column (V5) and to exclude outliers beyond the 95% confidence intervals. The last step we mark the outliers and score the model on correctly/incorrectly identified transactions. The model score can be viewed using the 'Scorer' node.
The steps we perform are shown below:
1. Read Training Data
2. Data Preprocessing
3. Normalize Data
4. Save Model
5. Filter and Isolate
6. Mark Outliers and Score
Workflow
Fraud_Detection_Distribution_Training
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 5.2.3
Legal
By using or downloading the workflow, you agree to our terms and conditions.