The effect of imbalance data mitigation techniques on cardiovascular disease prediction
Keywords:
Imbalance dataset, Cardiovascular disease prediction, SMOTE-TOMEK, Marchine learning, Overfitting and UnderfittingAbstract
The prevalence of class imbalance is a common challenge in medical datasets, which can adversely affect the performance of machine learning models. This paper explores how several data imbalance mitigation techniques affect the performance of cardiovascular disease prediction. This study applied various data balancing techniques on a real-life cardiovascular disease (CVD) dataset of 1000 patient records with 14 features obtained from the University of Abuja Teaching Hospital Nigeria to address this problem. The data balancing techniques used include random under-sampling, Synthetic Minority Over-sampling Technique (SMOTE), Synthetic Minority Oversampling-Edited Nearest Neighbour (SMOTE-ENN), and the combination of SMOTE and Tomek Links undersampling (SMOTE-TOMEK). After applying these techniques, their performance was evaluated on seven machine learning models, including Random Forest, XGBoost, LightGBM, Gradient Boosting, K-Nearest Neighbours, Decision Tree, and Support Vector Machine. The evaluation metrics used are precision, recall, F1-score, accuracy, and receiver operating characteristic-area under the curve (ROC-AUC). Learning curve plots were also used to showcase the impact of the different data balancing techniques on the challenges of overfitting and underfitting. The results showed that the application of data balancing techniques significantly enhances the performance of machine learning models in heart disease prediction and effectively addresses the challenges of overfitting and underfitting with SMOTE-TOMEK, yielding the best-balanced fit as well as the highest precision, recall, F1-score, accuracy of 92%, and ROC-AUC of 96% on the Lightweight Gradient Boosting Machine (LightGBM) model. These results underscore the critical role of data balancing in predictive modelling for heart disease and highlight the effectiveness of specific techniques and models in achieving accurate, more reliable, and generalised predictions.
Published
How to Cite
Issue
Section
Copyright (c) 2025 Raphael Ozighor Enihe, Rajesh Prasad, Francisca Nonyelum Ogwueleka, Fatimah Binta Abdullahi

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Similar Articles
- Shaymaa Mohammed Ahmed, Majid Khan Majahar Ali, Raja Aqib Shamim, Integrating robust feature selection with deep learning for ultra-high-dimensional survival analysis in renal cell carcinoma
- Christopher Ifeanyi Eke, Kholoud Maswadi, Musa Phiri, Mulenga Mwege, Mohammad Imran, Dekera Kenneth Kwaghtyo, Akeremale Olusola Collins, Effective tweets classification for disaster crisis based on ensemble of classifiers , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 3, August 2025
- Mokhtar Ali, Abdelkerim Souahlia, Abdelhalim Rabehi, Mawloud Guermoui, Ali Teta, Imad Eddine Tibermacine, Abdelaziz Rabehi, Mohamed Benghanem , A robust deep learning approach for photovoltaic power forecasting based on feature selection and variational mode decomposition , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 3, August 2025
- Olumide S. Adesina, Adedayo F. Adedotuun, Kayode S. Adekeye, Ogbu F. Imaga, Adeleke J. Adeyiga, Toluwalase J. Akingbade, On logistic regression versus support vectors machine using vaccination dataset , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 1, February 2024
- Sherifdeen O. Bolarinwa, Eli Danladi, Andrew Ichoja, Muhammad Y. Onimisia, Christopher U. Achem, Synergistic Study of Reduced Graphene Oxide as Interfacial Buffer Layer in HTL-free Perovskite Solar Cells with Carbon Electrode , Journal of the Nigerian Society of Physical Sciences: Volume 4, Issue 3, August 2022
- Oluwaseun IGE, Keng Hoon Gan, Ensemble feature selection using weighted concatenated voting for text classification , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 1, February 2024
- George Muddu, Shefiu Olusegun Ganiyu, Adekunle Olugbenga Ejidokun, Yusuf Abass Aleshinloye, Integrated data-driven credit default prediction in Uganda using machine learning models , Journal of the Nigerian Society of Physical Sciences: Volume 8, Issue 1, February 2026
- Philemon Uten Emmoh, Christopher Ifeanyi Eke, Timothy Moses, A feature selection and scoring scheme for dimensionality reduction in a machine learning task , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 1, February 2025
- Akila Dabara Kayit, Mohd Tahir Ismail, Novel way to predict stock movements using multiple models and comprehensive analysis: leveraging voting meta-ensemble techniques , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 3, August 2024
- S. I. Ele, U. R. Alo, H. F. Nweke, A. H. Okemiri, E. O. Uche-Nwachi, Deep convolutional neural network (DCNN)-based model for pneumonia detection using chest x-ray images , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 2, May 2025
You may also start an advanced similarity search for this article.

