A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
Keywords:
Imbalanced dataset, Ensemble Approach, Fraud detection, Stacking algorithm, Synthetic Minority Oversampling Technique (SMOTE)Abstract
In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected fraud cases also account for the loss of several millions of dollars annually. Thus, we propose an ensemble approach that stacks five classifiers - Support Vector Machine, Decision Trees, Random Forests, Gaussian Na¨?ve Bayes, and k-Nearest Neighbour, and uses the Logistic Regression meta-classifier to make predictions based on a stacking algorithm and novel pipeline. The effectiveness of the proposed model is examined on three datasets. The first two datasets were trained and tested initially without resampling and then compared with the results obtained using the Synthetic Minority Oversampling Technique (SMOTE) and RandomUnderSampler techniques. Only a balanced resampled dataset was trained on the third dataset that clearly showed an imbalance. From the results obtained, it is observed that the proposed model is highly competitive, with extant models producing ROC AUC of 99% and scoring above 98% in all other metrics. The approach is recommended for detecting fraud cases in similar case studies.
Published
How to Cite
Issue
Section
Copyright (c) 2024 Idongesit E. Eteng, Udeze L. Chinedu, Ayei E. Ibor
This work is licensed under a Creative Commons Attribution 4.0 International License.