Ensemble machine learning algorithm for cost-effective and timely detection of diabetes in Maiduguri, Borno State

Emmanuel Gbenga Dada; Aishatu Ibrahim Birma; Abdulkarim Abbas  Gora

doi:10.46481/jnsps.2024.2175

Authors

Emmanuel Gbenga Dada
[email protected]

Department of Mathematics and Computer Science, Faculty of Science, Borno State University, Maiduguri; Department of Computer Science, Faculty of Physical Sciences, University of Maiduguri, Maiduguri, Nigeria
https://orcid.org/0000-0002-1132-5447
Aishatu Ibrahim Birma
Department of Mathematics and Computer Science, Faculty of Science, Borno State University, Maiduguri
Abdulkarim Abbas Gora
Department of Mathematics and Computer Science, Faculty of Science, Borno State University, Maiduguri

Keywords:

Ensemble learning, Diabetes, Weighted average ensemble, Random forests, Light gradient boosting machine

Abstract

Diabetes is a serious medical condition that severely hinders the body's ability to produce or properly regulate insulin, leading to detrimental carbohydrate metabolism and dangerously high blood sugar levels. This ultimately causes inadequate carbohydrate metabolism and heightened blood glucose levels. Alarmingly, from 2000 to 2019, diabetes-related mortality rates rose by 3%. In the year 2019 alone, diabetes was tragically responsible for nearly 2 million deaths. This groundbreaking research introduces the improved weighted average ensemble learning (WAEL) model as an innovative solution for detecting diabetes. The enhanced WAEL model effectively addresses the overfitting challenge by integrating multiple models that have gained unique insights from the data. The proposed WAEL model ingeniously combines five feature spaces through the grey wolf optimisation (GWO) algorithm to uncover the optimal weight combination. GWO plays a vital role in weight optimization, enabling the reduction of weights in models that are particularly sensitive to noise. The results demonstrated that the improved WAEL achieved an astounding level of accuracy, soaring to 98.90%. The LGBM algorithm followed closely, achieving an impressive accuracy of 85.00%. The RF method recorded an accuracy of 81.00%. When it comes to accurately identifying diabetes, the improved WAEL ensemble model significantly outperformed the other five individual models, as evidenced by metrics such as accuracy, precision, recall, and F1-score. Therefore, the proposed model stands as a compelling alternative tool for healthcare professionals in the early detection of diabetes.

Dimensions

REFERENCES

K. Azbeg, M. Boudhane, O. Ouchetto & S. Jai Andaloussi, “Diabetes emergency cases identification based on a statistical predictive model”, Journal of Big Data 9 (2022) 1. https://doi.org/10.1186/s40537-022-00582-7.

D. Hunt, K. Lamb, J. Elliott, B. Hemmingsen, S. Slama, R. Scibilia & B. Mikkelsen, "A WHO key informant language survey of people with lived experiences of diabetes: media misconceptions, values-based messaging, stigma, framings and communications considerations'', Diabetes Research and Clinical Practice 193 (2022) 110109. https://doi.org/10.1016/j.diabres.2022.110109.

A.M. Egan & S. F. Dinneen, "What is diabetes?", Medicine 47 (2019) 1. https://doi.org/10.1016/j.mpmed.2018.10.002.

C. A. Robinson, G. Agarwal & K. Nerenberg, “Validating the CANRISK prognostic model for assessing diabetes risk in Canada’s multiethnic population”, Chronic Dis. Inj. Can. 32 (2011) 19. https://api.semanticscholar.org/CorpusID:12644541.

L. M. Villeneuve & R. Natarajan, “The role of epigenetics in the pathology of diabetic complications”, American Journal of Physiology-Renal Physiology 299 (2010) F14. https://doi.org/10.1152/ajprenal.00200.2010.

M. R. Chetan, S. L. Thrower & P. Narendran, "What is type 1 diabetes?", Medicine, 47 (2019) 5. https://doi.org/10.1016/j.mpmed.2018.10.006.

M. G. Tinajero & V. S. Malik, “An update on the epidemiology of type 2 diabetes: a global perspective”, Endocrinology and Meolism Clinics 50 (2021) 337. https://doi.org/10.1016/j.ecl.2021.05.013.

V. Bellou, L. Belbasis, I. Tzoulaki & E. Evangelou, “Risk factors for type 2 diabetes mellitus: an exposure-wide umbrella review of metaanalyses”, PloS one 13 (2018) e0194127. https://doi.org/10.1371/journal.pone.0194127.

P. Saeedi, I. Petersohn, P. Salpea, B. Malanda, S. Karuranga & N. Unwin, “Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: results from the International Diabetes Federation Diabetes Atlas”, Diabetes research and clinical practice 157 (2019) 107843. https://doi.org/10.1016/j.diabres.2019.107843.

J. O. Adeleye, “The hazardous terrain of diabetes mellitus in Nigeria: the time for action is now”, Research Journal of Health Sciences 9 (2021) 69. https://doi.org/10.4314/rejhs.v9i1.8.

O. O. Oladapo, L. Salako, O. Sodiq, K. Shoyinka, K. Adedapo & A. O. Falase, “A prevalence of cardiometabolic risk factors among a rural Yoruba south-western Nigerian population: a populationbased survey: cardiovascular topics”, Cardiovascular Journal of Africa 21 (2010) 26. https://www.researchgate.net/publication/41911100_A_prevalence_of_cardiometabolic_risk_factors_among_a_rural_Yoruba_south-western_Nigerian_population_A_population-based_survey

E. C. Ejim, C. I. Okafor, A. Emehel, A.U. Mbah, U. Onyia, T. Egwuonwu & B. J. Onwubere, “Prevalence of cardiovascular risk factors in the middle-aged and elderly population of a Nigerian rural community”, Journal of tropical medicine 2011 (2011) 308687. https://doi.org/10.1155/2011/308687.

A. Sabir, A. Ohwovoriole, S. Isezuo, O. Fasanmade, S. Abubakar & S. Iwuala, “Type 2 diabetes mellitus and its risk factors among the rural Fulanis of Northern Nigeria”, Annals of African medicine 12 (2013) 217. https://doi.org/10.4103/1596-3519.122689.

O. E. Enang, A. A. Otu, O. E. Essien, H. Okpara, O.A. Fasanmade, A. E. Ohwovoriole & J. Searle, “Prevalence of dysglycemia in Calabar: a cross-sectional observational study among residents of Calabar, Nigeria”, BMJ Open Diabetes Research and Care 2 (2014) e000032. https://doi.org/10.1136/bmjdrc-2014-000032.

A. E. Uloko, B. M. Musa, M. A. Ramalan, I. D. Gezawa, F. H. Puepet, A. T. Uloko & K.B. Sada, “Prevalence and risk factors for diabetes mellitus in Nigeria: a systematic review and meta-analysis”, Diabetes Therapy 9 (2018) 1307. https://doi.org/10.1007/s13300-018-0441-1.

I. D. Gezawa, F. H. Puepet, B. M. Mubi, A. E. Uloko, B. Bakki, M. A. Talle & I. Haliru, “Socio-demographic and anthropometric risk factors for type 2 diabetes in Maiduguri, North-Eastern Nigeria”, Sahel Medical Journal 18 (2015) 1. https://doi.org/10.4103/1118-8561.149495.

A. Sarwar, M. Ali, J. Manhas & V. Sharma, “Diagnosis of diabetes type-II using hybrid machine learning based ensemble model”, International Journal of Information Technology 12 (2020) 419. https://doi.org/10.1007/s41870-018-0270-5.

M. T. Alasaady, T. N. M. Aris, N. M. Sharef & H. Hamdan, “A proposed approach for diabetes diagnosis using neuro-fuzzy technique”, Bulletin of Electrical Engineering and Informatics 11 (2022) 3590. https://doi.org/10.11591/eei.v11i6.4269.

N. Abdulhadi & A. Al-Mousa, “Diabetes detection using machine learning classification methods”, International Conference on Information Technology (ICIT), Amman, Jordan, 2021, pp. 350-354. https://doi.org/10.1109/ICIT52682.2021.9491788.

U. E. Laila, K. Mahboob, A. W. Khan, F. Khan & W. Taekeun, “An ensemble approach to predict early-stage diabetes risk using machine learning: An empirical study”, Sensors 22 (2022) 5247. https://doi.org/10.3390/s22145247.

R. Katarya & S. Jain, “Comparison of different machine learning models for diabetes detection”, IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering (ICADEE), Coimbatore, India, 2020, pp. 1-5. https://doi.org/10.1109/ICADEE51157.2020.9368899.

S. Y. Rubaiat, M. M. Rahman & M. K. Hasan, “Important feature selection and accuracy comparisons of different machine learning models for early diabetes detection”, International Conference on Innovation in Engineering and Technology (ICIET), Dhaka, Bangladesh, 2018, pp. 1-6. https://doi.org/10.1109/CIET.2018.8660831.

G. Swapna, R. Vinayakumar & K. P. Soman, “Diabetes detection using deep learning algorithms”, ICT express 4 (2018) 243. https://doi.org/10.1016/j.icte.2018.10.005.

M. M. Islam, R. Ferdousi, S. Rahman & H. Y. Bushra, “Likelihood prediction of diabetes at early stage using data mining techniques”, Computer vision and machine intelligence in medical image analysis, Springer, Singapore, 2020, pp. 113-125. http://dx.doi.org/10.1007/978-981-13-8798-2_12.

A. K. Shukla, “Patient diabetes forecasting based on machine learning approach”, In Soft Computing: Theories and Applications: Proceedings of SoCTA 1154 (2020) 1017. https://doi.org/10.1007/978-981-15-4032-5_91.

E. G. Dada, J. S. Bassi, H. Chiroma, A. O. Adetunmbi & O. E. Ajibuwa, “Machine learning for email spam filtering: review, approaches and open research problems”, Heliyon 5 (2019). https://doi.org/10.1016/j.heliyon.2019.e01802.

J. L. Speiser, M. E. Miller, J. Tooze & E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling”, Expert systems with applications 134 (2019) 93. https://doi.org/10.1016/j.eswa.2019.05.028.

A. Taherkhani, G. Cosma & T. M. McGinnity, “AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning”, Neurocomputing 404 (2020) 351. https://doi.org/10.1016/j.neucom.2020.03.064.

G. Chen, H. He, L. Zhao, K. B. Chen, S. Li & C. Y. C. Chen, “Adaptive boost approach for possible leads of triple-negative breast cancer”, Chemometrics and Intelligent Laboratory Systems 231 (2022) 104690. https://doi.org/10.1016/j.chemolab.2022.104690.

E. G. Dada, D.O. Oyewola, S. B. Joseph, O. Emebo & O. O. Oluwagbemi, “Ensemble machine learning for monkeypox transmission time series forecasting”, Applied Sciences 12 (2022) 12128. https://doi.org/10.3390/app122312128.

C. N. Obiora, A. Ali & A. N. Hasan, “Implementing extreme gradient boosting (XGBoost) algorithm in predicting solar irradiance”, IEEE PES/IAS PowerAfrica conference, Nairobi, Kenya, 2021, pp. 1-5. https://doi.org/10.1109/PowerAfrica52236.2021.9543159.

A. Banerjee, A. Gajewicz-Skretna & K. Roy, “A machine learning qRASPR approach for efficient predictions of the specific surface area of perovskites”, Molecular Informatics 42 (2023) 2200261. https://doi.org/10.1002/minf.202200261.

E. K. Sahin, “Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest”, SN Applied Sciences 2 (2020) 1308. https://doi.org/10.1007/s42452-020-3060-1.

J. Dong, Y. Chen, B. Yao, X. Zhang & N. Zeng, “A neural network boosting regression model based on XGBoost”, Applied Soft Computing 125 (2022) 109067. https://doi.org/10.1016/j.asoc.2022.109067.

A. Sharma & B. Singh, “AE-LGBM: Sequence-based novel approach to detect interacting protein pairs via ensemble of autoencoder and LightGBM”, Computers in Biology and Medicine 125 (2020) 103964. https://doi.org/10.1101/2020.07.03.186866.

M. Massaoudi, S. S. Refaat, I. Chihi, M. Trabelsi, F. S. Oueslati & H. Abu-Rub, “A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting”, Energy 214 (2021) 118874. https://doi.org/10.1016/j.energy.2020.118874.

T. D. Pham, N. Yokoya, T. T. T. Nguyen, N. N. Le, N. T. Ha, J. Xia & T. D. Pham, “Improvement of mangrove soil carbon stocks estimation in North Vietnam using Sentinel-2 data and machine learning approach”, GIScience & Remote Sensing 58 (2021) 68. https://doi.org/10.1080/15481603.2020.1857623.

S. Gonzalez, S. Garc´ ´?a, J. Del Ser, L. Rokach & F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities”, Information Fusion 64 (2020) 205. https://doi.org/10.1016/j.inffus.2020.07.007.

V. A. Dev & M. R. Eden, “Formation lithology classification using scalable gradient boosted decision trees”, Computers and chemical engineering 128 (2019) 392. https://doi.org/10.1016/j.compchemeng.2019.06.001.

P. Chen, Y. Deng, X. Zhang, L. Ma, Y. Yan, Y. Wu & C. Li, “Degradation trend prediction of pumped storage unit based on MIC-LGBM and VMDGRU combined model”, Energies 15 (2022) 605. https://doi.org/10.3390/en15020605.

J. T. Hancock & T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review”, Journal of big data 7 (2020) 1. https://doi.org/10.1186/s40537-020-00369-8.

H. Zeng, B. Shao, H. Dai, Y. Yan & N. Tian, “Prediction of fluctuation loads based on GARCH family-CatBoost-CNNLSTM”, Energy 263 (2023) 126125. https://doi.org/10.1016/j.energy.2022.126125.

L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush & A. Gulin, “CatBoost: unbiased boosting with categorical features”, Advances in neural information processing systems 31 (2018) 6639. https://dl.acm.org/doi/abs/10.5555/3327757.3327770.

H. Luo, F. Cheng, H. Yu & Y. Yi, “SDTR: Soft decision tree regressor for tabular data”, IEEE Access, 9 (2021) 55999. https://doi.org/10.1109/ACCESS.2021.3070575.

R. Yang, P. Wang & J. Qi, “A novel SSA-CatBoost machine learning model for credit rating”, Journal of Intelligent and Fuzzy Systems 44 (2023) 1. http://doi.org/10.3233/JIFS-221652.

R. Olsson & S. Acharya, “Using automatic programming to improve gradient boosting for classification”, International Conference of Artificial Intelligence and Soft Computing (ICAISC), Zakopane, Poland, 2022, pp. 242–253. https://doi.org/10.1007/978-3-031-23492-7_21.

S. Mehta, P. Rana, S. Singh, A. Sharma & P. Agarwal, “Ensemble learning approach for enhanced stock prediction”, twelfth international conference on contemporary computing (IC3), Noida, India, 2019, pp. 1-5. http://dx.doi.org/10.1109/IC3.2019.8844891.

M. A. Ganaie, M. Hu, A. K. Malik, M. Tanveer & P.N. Suganthan, “Ensemble deep learning: A review”, Engineering Applications of Artificial Intelligence 115 (2022) 105151. https://doi.org/10.1016/j.engappai.2022.105151.

M. A. I. Neloy, N. Nahar, M. S. Hossain & K. Andersson, “A weighted average ensemble technique to predict heart disease”, International Conference on Trends in Computational and Cognitive Engineering (TCCE) , Tangail, Bangladesh, 2022, pp. 17–29. https://doi.org/10.1007/978-981-16-7597-3_2.

H. Bonab & F. Can, “Less is more: A comprehensive framework for the number of components of ensemble classifiers”, IEEE Transactions on neural networks and learning systems 30 (2019) 2735. https://doi.org/10.48550/arXiv.1709.02925.

Z. H. Zhou, Ensemble learning, In: Li, S.Z., Jain, A. (eds) Encyclopedia of Biometrics, Springer, Boston, MA, 2009, pp. 270–273. https://doi.org/10.1007/978-0-387-73003-5_293.

C. Molnar, “Interpretable machine learning”, Metamorphosis: A Journal of Management Research 23 (2020) 318. https://doi.org/10.1177/09726225241252009

B. M. Cheung & C. Li, “Diabetes and hypertension: is there a common metabolic pathway?”, Current atherosclerosis reports 14 (2012) 160. https://doi.org/10.1007/s11883-012-0227-2.

M. J. Cryer, T. Horani & D. J. DiPette, “Diabetes and hypertension: a comparative review of current guidelines”, The Journal of Clinical Hypertension 18 (2016) 95. https://doi.org/10.1111/jch.12638.

N. P. Chokshi, E. Grossman & F. H. Messerli, “Blood pressure and diabetes: vicious twins”, Heart 99 (2013) 577. https://doi.org/10.1136/heartjnl-2012-302029.

A. Lazar, A. Sim & K. Wu, “GPU-based classification for wireless intrusion detection”, Systems and Network Telemetry and Analytics, Virtual, 2020, pp. 27–31. https://doi.org/10.1145/3452411.3464445

A. Dutta, M. K. Hasan, M. Ahmad, M. A. Awal, M. A. Islam, M. Masud & H. Meshref, “Early prediction of diabetes using an ensemble of machine learning models”, International Journal of Environmental Research and Public Health 19 (2022) 12378. https://doi.org/10.3390/ijerph191912378.

R. F. Albadri, S. M. Awad, A.S. Hameed, T. H. Mandeel & R. A. Jabbar, “A diabetes prediction model using hybrid machine learning algorithm”, Mathematical Modelling of Engineering Problems 11 (2024) 2119. https://doi.org/10.18280/mmep.110813.

R. Karthikeyan, P. Geetha & E. Ramaraj, “Prediction of diabetes and cholesterol diseases based on ensemble learning techniques”, International Journal of Scientific and Technology Research 9 (2020) 491. https://api.semanticscholar.org/CorpusID:214635142.

M. Li, X. Fu & D. Li, “Diabetes prediction based on XGBoost algorithm”, IOP Conf. Ser. Mater. Sci. Eng. 768 (2020) 072093. https://doi.org/10.1088/1757-899x/768/7/072093.

Z. Mushtaq, M. F. Ramzan, S. Ali, S. Baseer, A. Samad & M. Husnain, “Voting classification-based diabetes mellitus prediction using hypertuned machine-learning techniques”, Mob. Inf. Syst. 18 (2022) 1. https://doi.org/10.1155/2022/6521532.

M. Atif, F. Anwer & F. Talib, “An ensemble learning approach for effective prediction of diabetes mellitus using hard voting classifier”, Indian Journal of Science and Technology 15 (2022) 1978. https://doi.org/10.17485/IJST/v15i39.1520.

T. T. Aurpa, S. M. Jeba & S. U. Rasel, “Ensemble Methods of Machine Learning Algorithms for Early Diabetic Detection in Comparison”, International Conference on Circuits, Power and Intelligent Systems (CCPIS), Bhubaneswar, India, 2023, pp. 1-6. https://doi.org/10.1109/CCPIS59145.2023.10291566.

Q. Saihood & E. Sonuc¸, “A practical framework for early detection of diabetes using ensemble machine learning models”, Turkish Journal of Electrical Engineering and Computer Sciences 31 (2023) 722. https://doi.org/10.55730/13000632.4013.

S. Mirjalili, S.M. Mirjalili & A. Lewis, “Grey wolf optimizer”, Adv Eng Softw 69 (2014) 46. https://doi.org/10.1016/j.advengsoft.2013.12.007