Credit Card Fraud Detection

Farzaneh Hashemi
Aug 11, 2023
3 min read

Updated: Oct 6, 2023

This a machine learning exercise to predict whether a transaction id fraud or not fraud (classification). I’m going to be using Python and Jupyter notebooks as well as various libraries throughout this project. Most of the libraries I’ve already installed so the installation won’t be included in my code.

Here’s a link to the code:

Code

And here’s a link to the dataset:

Dataset

I’m going to start by importing pandas and reading the file. I’ll take a look at some of the data.

Most of data just looks like random numbers because they are indicators of private information, so the real data is not shared in it’s original format. We have column called ‘Class’ which has the values 1 or 0 whether a transaction is fraud or not fraud.

I’m creating histograms of each column so I can get a better sense of the distribution of the data:

As we can see in the histograms for the V27, V28 and Amount columns the data is mostly spread in one part and we want to get rid of the outliers. So I’m going to use RobustScaler from the sklearn.preprocessing library. I’m also going to rescale the data in the Time column so that it ranges from 0 to 1.

The new Time column:

Next I’m going to split the data into data for training, testing and validating, as well as transforming the data set into a numpy array;

The first model I’m going to try is Logistic Regression. I’m using the sklearn library to do so.

Our model doesn’t converge completely but it’s also not too far from converging. I’m going to check at the Classification Report.

I’m going to give a brief description of these terms:

True Positives: Correctly Classified Fraud Transactions
False Positives: Incorrectly Classified Fraud Transactions
True Negative: Correctly Classified Non-Fraud Transactions
False Negative: Incorrectly Classified Non-Fraud Transactions
Precision: True Positives/(True Positives + False Positives)
Recall: True Positives/(True Positives + False Negatives)
Precision: as the name says, says how precise (how sure) is our model in detecting fraud transactions while recall is the amount of fraud cases our model is able to detect.
F1-score: Here it’s representing a balance between the precision and the recall.
Precision/Recall Tradeoff: The more precise (selective) our model is, the less cases it will detect. Example: Assuming that our model has a precision of 95%, Let's say there are only 5 fraud cases in which the model is 95% precise or more that these are fraud cases. Then let's say there are 5 more cases that our model considers 90% to be a fraud case, if we lower the precision there are more cases that our model will be able to detect.

And I’m going to give a brief explanation at what we’re looking at:

Our Accuracy is 1 but this isn’t a good indicator of how well our model works. Most of our Dataset is non-Fraud transactions as opposed to a small number of transactions being fraud. So even if we automatically assume all the transactions as non-fraud we will have a high accuracy and yet we won’t be catching the transactions that are Fraud. The Recall for Fraud in this case is 0.53 which means that 53% of Fraud transactions we predicted as being not Fraud were in fact Fraud. So in terms of detecting Fraud this model works horribly. The Precision of the non-Fraud transactions is 1 so almost 100% of non-Fraud cases we are predicting correctly as non-Fraud. This is because most of our data is non-Fraud so our model is trained well on predicting non-fraud transactions. In the case of this Project I would say our goal is to have a high Fraud Recall so we can Predict Fraudulent transactions well and if we lose some accuracy in predicting some non-Fraud transactions as Fraud it’s a better trade off.

I’m going to try a few more models to see if I can get better results and compare:

Neural Net Model:

Random Forest Classifier:

Gradient Boosting Classifier:

Linear SVC :

We’ve definitely are starting to see better results however since our original dataset has a very few number of fraud cases compared to non-fraud, if we use a sample with the data being more evenly distributed between fraud and non-fraud we will most likely get better results.

So we’re going to work with a more balanced sample dataset and we are splitting are data for training, testing and validating as follows: