Fraud Detection: Distribution Method - Training

In this workflow, the Distribution Method is used to check for fraud. The Distribution method for classification is particularly useful in data where majority of the data is expected to follow a certain pattern or distribution. For credit card transactions, we can use this to help determine whether there is potential fraud or not. We start with reading in the training data from a sample dataset. The table is preprocessed to convert the classifiers of "0" or "1" to either "good" or "fraud". Next, the data undergoes a Z-score normalization, which standardizes the data to a mean of zero and a standard deviation of one, making it easier to compare different scales. The z-score normalization model is exported for later use in deployment. We analyze the data to check distributions and employ filters to isolate a single column (V5) and to exclude outliers beyond the 95% confidence intervals. The last step we mark the outliers and score the model on correctly/incorrectly identified transactions. The model score can be viewed using the 'Scorer' node.

The steps we perform are shown below:

Read Training Data
Data Preprocessing
Normalize Data
Save Model
Filter and Isolate
Mark Outliers and Score