The effect of imbalance data mitigation techniques on cardiovascular disease prediction
Keywords:
Imbalance dataset, Cardiovascular disease prediction, SMOTE-TOMEK, Marchine learning, Overfitting and UnderfittingAbstract
The prevalence of class imbalance is a common challenge in medical datasets, which can adversely affect the performance of machine learning models. This paper explores how several data imbalance mitigation techniques affect the performance of cardiovascular disease prediction. This study applied various data balancing techniques on a real-life cardiovascular disease (CVD) dataset of 1000 patient records with 14 features obtained from the University of Abuja Teaching Hospital Nigeria to address this problem. The data balancing techniques used include random under-sampling, Synthetic Minority Over-sampling Technique (SMOTE), Synthetic Minority Oversampling-Edited Nearest Neighbour (SMOTE-ENN), and the combination of SMOTE and Tomek Links undersampling (SMOTE-TOMEK). After applying these techniques, their performance was evaluated on seven machine learning models, including Random Forest, XGBoost, LightGBM, Gradient Boosting, K-Nearest Neighbours, Decision Tree, and Support Vector Machine. The evaluation metrics used are precision, recall, F1-score, accuracy, and receiver operating characteristic-area under the curve (ROC-AUC). Learning curve plots were also used to showcase the impact of the different data balancing techniques on the challenges of overfitting and underfitting. The results showed that the application of data balancing techniques significantly enhances the performance of machine learning models in heart disease prediction and effectively addresses the challenges of overfitting and underfitting with SMOTE-TOMEK, yielding the best-balanced fit as well as the highest precision, recall, F1-score, accuracy of 92%, and ROC-AUC of 96% on the Lightweight Gradient Boosting Machine (LightGBM) model. These results underscore the critical role of data balancing in predictive modelling for heart disease and highlight the effectiveness of specific techniques and models in achieving accurate, more reliable, and generalised predictions.
Published
How to Cite
Issue
Section
Copyright (c) 2025 Raphael Ozighor Enihe, Rajesh Prasad, Francisca Nonyelum Ogwueleka, Fatimah Binta Abdullahi

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Similar Articles
- Majid Khan Bin Majahar Ali, Shahida Shahnawaz, An inverse physics-informed neural network (I-PINN) framework for parameter estimation in mixed convection and melting effects , Journal of the Nigerian Society of Physical Sciences: Volume 8, Issue 2, May 2026
- Amos Orenyi Bajeh, Mary Olayinka Olaoye, Fatima Enehezei Usman-Hamza, Ikeola Suhurat Olatinwo, Peter ogirima Sadiku, Abdulkadir Bolakale Sakariyah, An adaptive neuro-fuzzy inference system for multinomial malware classification , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 1, February 2025
- Kazeem A. Tijani, Chinwendu E. Madubueze, Reuben I. Gweryina, Terhemen Aboiyar, Mathematical modelling and optimal-control strategies of schistosomiasis–typhoid fever co-infection: using the demographic setting of Makurdi, Benue State, Nigeria , Journal of the Nigerian Society of Physical Sciences: Volume 8, Issue 3, August 2026 (In Progress)
- Gurpreet Tuteja, Tapshi Singh, Comments on “The Solution of a Mathematical Model for Dengue Fever Transmission Using Differential Transformation Method: J. Nig. Soc. Phys. Sci. 1 (2019) 82-87” , Journal of the Nigerian Society of Physical Sciences: Volume 3, Issue 2, May 2021
- Abiola T. Owolabi, Kayode Ayinde, Taiwo J. Adejumo, Wakeel A. Kasali, Emmanuel T. Adewuyi, Comparative Analysis of the Implication of Periods Before and During Vaccination of COVID-19 Infection in Some Regional Leading African Countries , Journal of the Nigerian Society of Physical Sciences: Volume 4, Issue 2, May 2022
- Umaru Hassan, Mohd Tahir Ismail, Improving forecasting accuracy using quantile regression neural network combined with unrestricted mixed data sampling , Journal of the Nigerian Society of Physical Sciences: Volume 5, Issue 4, November 2023
- Muteeu A. Olopade, Anthony B. Adegboyega, Kayode I. Ogungbemi, Adeyinka D. Adewoyin, Investigation of the behaviour of tunable chalcogenide-Bismuth based perovskite BiTl (SxSe1-x)3(X = 0, 0.33, 0.67, 1): first principles calculations , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 1, February 2025
- Felix Yakubu Eguda, Andrawus James, Sunday Babuba, The Solution of a Mathematical Model for Dengue Fever Transmission Using Differential Transformation Method , Journal of the Nigerian Society of Physical Sciences: Volume 1, Issue 3, August 2019
- C. Otobrise, G. A. Orotomah, Estimation of Critical and Thermophysical Properties of Saturated Cyclic Alkanes by Group Contributions , Journal of the Nigerian Society of Physical Sciences: Volume 4, Issue 3, August 2022
- Osita Miracle Nwakeze, Naveed Uddin Mohammed, Obaze Caleb Akachukwu, Umerah Anthony Tochukwu, Oji Nkechi Blessing, Ibeh Sylvarine Chinasa, Odeh Christopher, Dynamic-kernel CNN-LSTM for real-time intrusion detection in low-power healthcare IoT systems , Journal of the Nigerian Society of Physical Sciences: Volume 8, Issue 3, August 2026 (In Progress)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Muhammad Musa Liman, Rajesh Prasad, Hauwa Ahmad Amshi, Feature-optimized hybrid CNN–ViT architecture for sustainable vision-based condition assessment in agriculture , Journal of the Nigerian Society of Physical Sciences: Volume 8, Issue 2, May 2026

