The effect of imbalance data mitigation techniques on cardiovascular disease prediction
Keywords:
Imbalance dataset, Cardiovascular disease prediction, SMOTE-TOMEK, Marchine learning, Overfitting and UnderfittingAbstract
The prevalence of class imbalance is a common challenge in medical datasets, which can adversely affect the performance of machine learning models. This paper explores how several data imbalance mitigation techniques affect the performance of cardiovascular disease prediction. This study applied various data balancing techniques on a real-life cardiovascular disease (CVD) dataset of 1000 patient records with 14 features obtained from the University of Abuja Teaching Hospital Nigeria to address this problem. The data balancing techniques used include random under-sampling, Synthetic Minority Over-sampling Technique (SMOTE), Synthetic Minority Oversampling-Edited Nearest Neighbour (SMOTE-ENN), and the combination of SMOTE and Tomek Links undersampling (SMOTE-TOMEK). After applying these techniques, their performance was evaluated on seven machine learning models, including Random Forest, XGBoost, LightGBM, Gradient Boosting, K-Nearest Neighbours, Decision Tree, and Support Vector Machine. The evaluation metrics used are precision, recall, F1-score, accuracy, and receiver operating characteristic-area under the curve (ROC-AUC). Learning curve plots were also used to showcase the impact of the different data balancing techniques on the challenges of overfitting and underfitting. The results showed that the application of data balancing techniques significantly enhances the performance of machine learning models in heart disease prediction and effectively addresses the challenges of overfitting and underfitting with SMOTE-TOMEK, yielding the best-balanced fit as well as the highest precision, recall, F1-score, accuracy of 92%, and ROC-AUC of 96% on the Lightweight Gradient Boosting Machine (LightGBM) model. These results underscore the critical role of data balancing in predictive modelling for heart disease and highlight the effectiveness of specific techniques and models in achieving accurate, more reliable, and generalised predictions.
Published
How to Cite
Issue
Section
Copyright (c) 2025 Raphael Ozighor Enihe, Rajesh Prasad, Francisca Nonyelum Ogwueleka, Fatimah Binta Abdullahi

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Similar Articles
- P. O. Odion, M. N. Musa, S. U. Shuaibu, Age Prediction from Sclera Images using Deep Learning , Journal of the Nigerian Society of Physical Sciences: Volume 4, Issue 3, August 2022
- Timothy Kayode Samson, Francis Olatunbosun Aweda, Wind speed prediction in some major cities in Africa using Linear Regression and Random Forest algorithms , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 4, November 2024
- Christian N. Nwaeme, Adewale F. Lukman, Robust hybrid algorithms for regularization and variable selection in QSAR studies , Journal of the Nigerian Society of Physical Sciences: Volume 5, Issue 4, November 2023
- David Opeoluwa Oyewola, Emmanuel Gbenga Dada, Juliana Ngozi ndunagu, Terrang Abubakar Umar, Akinwunmi S.A, COVID-19 Risk Factors, Economic Factors, and Epidemiological Factors nexus on Economic Impact: Machine Learning and Structural Equation Modelling Approaches , Journal of the Nigerian Society of Physical Sciences: Volume 3, Issue 4, November 2021
- Gabriel James, Anietie Ekong, Etimbuk Abraham, Enobong Oduobuk, Peace Okafor, Analysis of support vector machine and random forest models for predicting the scalability of a broadband network , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 3, August 2024
- Gabriel James, Ime Umoren, Anietie Ekong, Saviour Inyang, Oscar Aloysius, Analysis of support vector machine and random forest models for classification of the impact of technostress in covid and post-covid era , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 3, August 2024
- Emmanuel Gbenga Dada, Aishatu Ibrahim Birma, Abdulkarim Abbas Gora, Ensemble machine learning algorithm for cost-effective and timely detection of diabetes in Maiduguri, Borno State , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 4, November 2024
- Emmanuel C. Ukekwe, Adaora A. Obayi, Akpa Johnson, Daniel A. Musa, Jonathan C. Agbo, Optimizing data and voice service delivery for mobile phones based on clients' demand and location using affinity propagation machine learning , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 2, May 2025
- Fathelrhman EL Guma, Ossama M. Badawy, Mohammed Berir, Mohamed A. Abdoon, Numerical Analysis of Fractional-Order Dynamic Dengue Disease Epidemic in Sudan , Journal of the Nigerian Society of Physical Sciences: Volume 5, Issue 2, May 2023
- O. J. Ibidoja, F. P. Shan, Mukhtar, J. Sulaiman, M. K. M. Ali, Robust M-estimators and Machine Learning Algorithms for Improving the Predictive Accuracy of Seaweed Contaminated Big Data , Journal of the Nigerian Society of Physical Sciences: Volume 5, Issue 1, February 2023
You may also start an advanced similarity search for this article.

