A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets

Authors

  • Idongesit E. Eteng Department of Computer Science, University of Calabar, Calabar, Nigeria
  • Udeze L. Chinedu Department of Computer Science and Creative Technologies, University of the West of England, Bristol, United Kingdom
  • Ayei E. Ibor Department of Computer Science, University of Calabar, Calabar, Nigeria

Keywords:

Imbalanced dataset, Ensemble Approach, Fraud detection, Stacking algorithm, Synthetic Minority Oversampling Technique (SMOTE)

Abstract

In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected fraud cases also account for the loss of several millions of dollars annually. Thus, we propose an ensemble approach that stacks five classifiers - Support Vector Machine, Decision Trees, Random Forests, Gaussian Na¨?ve Bayes, and k-Nearest Neighbour, and uses the Logistic Regression meta-classifier to make predictions based on a stacking algorithm and novel pipeline. The effectiveness of the proposed model is examined on three datasets. The first two datasets were trained and tested initially without resampling and then compared with the results obtained using the Synthetic Minority Oversampling Technique (SMOTE) and RandomUnderSampler techniques. Only a balanced resampled dataset was trained on the third dataset that clearly showed an imbalance. From the results obtained, it is observed that the proposed model is highly competitive, with extant models producing ROC AUC of 99% and scoring above 98% in all other metrics. The approach is recommended for detecting fraud cases in similar case studies.

Dimensions

Aitken R. “U.S. card fraud losses could exceed 12B USD by 2020”, Forbes 2016. [Online] http://www.forbes.com/sites/rogeraitken/2016/10/26/us-card-fraud-losses-could-exceed-12bn-by-2020/.

F. Itoo & S. Singh “Comparison and analysis of logistic regression, Na¨?ve Bayes and KNN machine learning algorithms for credit card fraud detection”, International Journal of Information Technology 13 (2021) 1503. [Online] https://link.springer.com/article/10.1007/s41870-020-00430-y.

D. Huang, Y. Lin, Z. Weng & J. Xiong, “Decision Analysis and Prediction Based on Credit Card Fraud Data”, The 2nd European Symposium on Computer and Communications, New York, NY, USA, 20–26. https://doi.org/10.1145/3478301.3478305.

L. Moumeni, M. Saber, I. Slimani, I. Elfarissi & Z. Bougroun, “Machine learning for credit card fraud detection”, Lecture Notes in Electrical Engineering WITS 2020, Springer Singapore, 2021, pp. 211–221. http://dx.doi.org/10.1007/978-981-33-6893-4_20.

N. Yousefi, M. Alaghband & I. Garibay, “A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection”, ArXiv abs/1912.02629 (2019) 02629. https://doi.org/10.48550/arXiv.1912.02629

G. Rushin, C. Stancil, M. Sun, S. Adams & P. Beling “Horse race analysis in credit card fraud detection using deep learning, logistic regression, and gradient boosted tree”, IEEE Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, Virginia, USA, 2017, pp. 117–121. https://doi.org/10.1109/SIEDS.2017.7937700.

Y. Wang, S. Adams, P. Beling, S. Greenspan, S. Rajagopalan, M. VelezRojas, S. Mankovski, S. Boker & D. Brown, “Privacy-preserving distributed deep learning and its application in credit card fraud detection”, IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), New York, NY, USA, 2018, 1070–8. https://ieeexplore.ieee.org/document/8456019.

A. Bahnsen, A. Stojanovic, D. Aouada & B. Ottersten, “Cost-sensitive credit card fraud detection using Bayes minimum risk”, International Conference on Machine Learning and Applications (ICMLA), Miami, Florida, USA, 2013, pp. 333–8. https://ieeexplore.ieee.org/document/6784638.

A. Pozzolo, O. Caelen, Y. A Le Borgne, S. Waterschoot & G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective”, Expert systems with applications 41 (2014) 4915. https://doi.org/10.1016/j.eswa.2014.02.026.

A. Pozzolo, G. Boracchi, O. Caelen, C. Alippi & G. Bontempi, “Credit card fraud detection and concept-drift adaptation with delayed supervised information”, International Joint Conference Neural Networks (IJCNN), Killarney, Ireland, 2015, PP. 1–8. https://ieeexplore.ieee.org/document/7280527.

V. Vlasselaer, C. Bravo, O. Caelen, T. Eliassi-Rad, L. Akoglu, Snoeck & B. Baesens, “Apate: A novel approach for automated credit card transaction fraud detection using network-based extensions Decision Support Systems”, Decis. Support Syst. 75 (2015) 38. http://dx.doi.org/10.1016/j.dss.2015.04.013.

R. Mohammed, K. Wong, M. Shiratuddin & X. Wang, “Scalable machine learning techniques for highly imbalanced credit card fraud detection: A comparative study”, Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, 2018, pp. 237-246. https://doi.org/10.1007/978-3-319-97310-4_27.

N. Mahmoudi & E. Duman “Detecting credit card fraud by modified fisher discriminant analysis”, Expert Systems with Applications 42 (2015) 2510. https://doi.org/10.1016/j.eswa.2014.10.037.

M. Mahmud, S. Meesad, “An evaluation of computational intelligence in credit card fraud detection”, International Computer Science and Engineering Conference (ICSEC), Austin, Texas, USA, 2016 pp. 1–6. https://ieeexplore.ieee.org/document/7859947.

K. R Seeja & M. Zareapoor, “Fraudminer: A novel credit card fraud detection model based on frequent itemset mining”, The Scientific World Journal 2014 (2014) 1. https://doi.org/10.1155/2014/252797.

S. Kumari & A. Choubey, “Credit card fraud detection using Hmm and k-means clustering algorithm”, International Journal of Scientific Research Engineering and Technology (IJSRET) 6 (2017) 2278. [Online] https://www.semanticscholar.org/paper/Credit-Card-Fraud-Detection-Using-HMM-and-K-Means-Kumari-Bhilai/16146abaf34f53fa1380f4addb84527dd54e3fcf.

T. Behera & S. Panigrahi, “redit card fraud detection: a hybrid approach using fuzzy clustering & neural network”, International Conference of Advances in Computing and Communication Engineering (ICACCE), Dehradun, India, 2015, pp. 494-9. https://ieeexplore.ieee.org/document/7306735.

C. Jiang, J. Song, G. Liu, L. Zheng & W. Luan, “Credit card fraud detection: A novel approach using aggregation strategy and feedback mechanism”, IEEE Internet of Things Journal 5 (2018) 3637. https://doi.org/10.1109/JIOT.2018.2816007.

D. Olszewski, “Fraud detection using self-organizing map visualizing the user profiles”, Knowledge-Based Systems 70 (2014) 324. https://doi.org/10.1016/j.knosys.2014.07.008.

V. Agaskar, M. Babariya, S. Chandran & N. Giri, “Unsupervised learning for credit card fraud detection”, International Research Journal of Engineering and Technology 4 (2017) 2343. [Online] https://www.irjet.net/archives/V4/i3/IRJET-V4I3608.pdf.

N. Vaishnavi & S. Geetha, “Credit Card Fraud Detection using Machine Learning Algorithms”, International Conference on Recent Trends in Advanced Computing, Chennai, India, 2019, pp. 631-641. https://doi.org/10.1016/j.procs.2020.01.057.

H. Zhu, G. Liu, M. Zhou, Y. Xie, A. Abusorrah & Q. Kang, “Optimizing Weighted Extreme Learning Machines for imbalanced classification and application to credit card fraud detection”, Neurocomputing 407 (2020) 50. https://doi.org/10.1016/j.neucom.2020.04.078.

X. Li, S. Han, L. Zhao, C. Gong & X. Liu, “New dandelion algorithm optimizes extreme learning machine for biomedical classification problems”, Comput. Intell. Neurosci. 2017 (2017) 1. https://doi.org/10.1155/2017/4523754.

Y. Yu, S. Gao, Y. Wang & Y. Todo, “Global optimum-based search differential evolution”, IEEE/CAA J. Autom. Sin. 6 (2019) 379. http://dx.doi.org/10.1109/JAS.2019.1911378.

Z. Wang, G. Yu, Y. Kang, Y. Zhao & Q. Qu, “Breast tumor detection in digital mammography based on extreme learning machine”, Neurocomputing 128 (2014) 17. https://doi.org/10.1016/j.neucom.2013.05.053.

C. Chen, W. Li, H. Su & K. Liu, “Spectral-spatial classification of hyperspectral image based on kernel extreme learning machine”, Remote Sens 6 (2014) 5795. https://doi.org/10.3390/rs6065795.

T. Liu, L. Hu, C. Ma, Z. Wang & H. Chen, “A fast approach for detection of erythemato-squamous diseases based on extreme learning machine with maximum relevance minimum redundancy feature selection”, Int. J. Syst. Sci. 46 (2015) 919. http://dx.doi.org/10.1080/00207721.2013.801096.

Q. Li, H. Chen, H. Huang, X. Zhao, Z. Cai, C. Tong & X. Tian, “An enhanced grey wolf optimization based feature selection wrapped kernel extreme learning machine for medical diagnosis”, Comput. Math. Methods Med. 2017 (2017) 1. https://doi.org/10.1155/2017/9512741

D. Zhao, C. Huang, Y. Wei, F. Yu, M. Wang & H. Chen “An effective computational model for bankruptcy prediction using kernel extreme learning machine approach”, Comput. Econ. 49 (2017) 325. https://doi.org/10.1007/s10614-016-9562-7.

W. Deng, Q. Zheng & Z. Wang,“Cross-person activity recognition using reduced kernel extreme learning machine”, Neural Netw. 53 (2014) 1. https://doi.org/10.1016/j.neunet.2014.01.008.

B. Liu, L. Tang, J. Wang, A. Li & Y. Hao, “2-D defect profile reconstruction from ultrasonic guided wave signals based on QGAkernelized ELM”, Neurocomputing 128 (2014) 217. https://doi.org/10.1016/j.neucom.2012.11.053.

Y. Lucas & J. Jurgovsky, “Credit card fraud detection using machine learning: A survey”, ArXiv abs/2010.06479 (2020) 06479. http://dx.doi.org/10.48550/arXiv.2010.06479.

A. Izotova & A. Valiullin, “Comparison of Poisson process and machine learning algorithms approach for credit card fraud detection”, Procedia Computer Science 186 (2021) 721. https://doi.org/10.1016/j.procs.2021.04.214.

S. Arora, S. Bindra, S. Singh & V. Nassa, “Prediction of credit card defaults through data analysis and machine learning techniques”, Materials Today: Proceedings 51 (2021) 110. https://doi.org/10.1016/j.matpr.2021.04.588.

E. Burnaev, P. Erofeev & A. Papanov, “Influence of Resampling on Accuracy of Imbalanced Classification”, International Conference on Machine Vision, Lille, France, 2015, pp. 5–12. http://dx.doi.org/10.1117/12.2228523

T. Saito & M. Rehmsmeier, “The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets”, PLOS ONE 10 (2015) e0118432. https://doi.org/10.1371/journal.pone.0118432 .

A. Dal Pozzolo, O. Caelen & Y. Le Borgne, “Learned lessons in credit card fraud detection from a practitioner perspective”, Expert systems with applications 41 (2014) 4915. https://doi.org/10.1016/j.eswa.2014.02.026.

A. Abdallah, M. Maarof & A. Zainal “Fraud detection system: A survey”, Journal of Network and Computer Applications 68 (2016) 90. https://doi.org/10.1016/j.jnca.2016.04.007.

J. A. P. Karax, A. Malucelli & J. P Barddal, “Decision tree-based feature ranking in concept drifting data streams”, Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, Limassol, Cyprus, 2019, pp. 590–592. https://doi.org/10.1145/3297280.3297551.

H. M.Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger, G. Holmes & T. Abdessalem, “Adaptive random forest for evolving data stream classification”, Machine Learning 106 (2017) 1. https://link.springer.com/article/10.1007/s10994-017-5642-8.

J. P. Barddal & F. Enembreck, “Learning regularized hoeffding trees from data streams”, Symposium on Applied Computing, Limassol, Cyprus, 2019, pp. 574–581 https://doi.org/10.1145/3297280.3297334.

D. M. H Souza & C. J. Bordin, “Ensemble and Mixed Learning Techniques for Credit Card Fraud Detection”, ArXiv arXiv:2112.02627 (2021) 1. https://doi.org/10.48550/arXiv.2112.02627.

F. Ahmed & R. Shamsuddin, “A comparative study of credit card fraud detection using the combination of machine learning techniques with data imbalance solution”, 2nd International Conference on Computing and Data Science, Stanford, CA, USA, 2021, pp. 112–118. https://doi.org/10.1109/CDS52072.2021.00026.

K. Kerwin & N. D. Bastian, “Stacked generalizations in imbalanced fraud datasets using resampling methods”, Journal of Defense Modeling and Simulation: Applications, Methodology, Technology (2021); 18 (2021) 175. https://doi.org/10.1177/1548512920962219.

S. Bagga, A. Goyal, N. Gupta & A. Goyal, “Credit card fraud detection using pipelining and ensemble learning”, International Conference on Smart Sustainable Intelligent Computing and Applications under ICITETM2020. Procedia Computer Science 173 (2020) 104. https://doi.org/10.1016/j.procs.2020.06.014.

S. Rajora, D. L. Li, C. Jha, N. Bharill, O. P. Patel, S. Joshi, D Putal & M.prsad, “A Comparative Study of Machine Learning Techniques for Credit Card Fraud Detection Based on Time Variance”, IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 2018, pp. 1958–1963. https://doi.org/10.1109/SSCI.2018.8628930.

T. K. Dang, T. C Tran, L. M. Tuan & M. V Tiep, “Machine Learning based on Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection Systems”, Applied Sciences 11 (2021) 10004. https://doi.org/10.3390/app112110004.

UCI Machine Learning Repository. [Online] http://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients. [Accessed 25 January 2022].

gksj7. GitHub. [Online] https://github.com/gksj7/creditcardcsvpresent/blob/main/creditcardcsvpresent.csv. [Accessed 24 January 2022].

Kaggle, “Credit Card Fraud Detection”, [Online] https://www.kaggle.com/mlg-ulb/creditcardfraud. [Accessed 23 January 2022].

Machine Learning Mastery, “Stacking Ensemble Machine Learning with Python”, [Online] https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/. [Accessed 2 Febraury 2022].

U. Leonard. GitHub. [Online] https://github.com/UdezeLeoportals/Machine-learning/blob/main/ensemble credit rerun1.ipynb.

Pipeline of the stacking approach

Published

2025-02-01

How to Cite

A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets. (2025). Journal of the Nigerian Society of Physical Sciences, 7(1), 2066. https://doi.org/10.46481/jnsps.2025.2066

Issue

Section

Computer Science

How to Cite

A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets. (2025). Journal of the Nigerian Society of Physical Sciences, 7(1), 2066. https://doi.org/10.46481/jnsps.2025.2066