A feature selection and scoring scheme for dimensionality reduction in a machine learning task
Keywords:
Algorithm, Dataset, Dimensionality reduction, Feature selectionAbstract
Selection of important features is very vital in machine learning tasks involving high-dimensional dataset with large features. It helps in reducing the dimensionality of a dataset and improving model performance. Most of the feature selection techniques have restriction in the kind of dataset to be used. This study proposed a feature selection technique that is based on statistical lift measure to select important features from a dataset. The proposed technique is a generic approach that can be used in any binary classification dataset. The technique successfully determined the most important feature subset and outperformed the existing techniques. The proposed technique was tested on lungs cancer dataset and happiness classification dataset. The effectiveness of the proposed technique in selecting important features subset was evaluated and compared with other existing techniques, namely Chi-Square, Pearson Correlation and Information Gain. Both the proposed and the existing techniques were evaluated on five machine learning models using four standard evaluation metrics such as accuracy, precision, recall and F1-score. The experimental results of the proposed technique on lung cancer dataset shows that logistic regression, decision tree, adaboost, gradient boost and random forest produced a predictive accuracy of 0.919%, 0.935%, 0.919%, 0.935% and 0.935% respectively, and that of happiness classification dataset produced a predictive accuracy of 0.758%, 0.689%, 0.724%, 0.655% and 0.689% on random forest, k-nearest neighbor, decision tree, gradient boost and cat boost respectively, which outperformed the existing techniques.
Published
How to Cite
Issue
Section
Copyright (c) 2024 Philemon Uten Emmoh, Christopher Ifeanyi Eke, Timothy Moses

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Similar Articles
- Gabriel James, Ifeoma Ohaeri, David Egete, John Odey, Samuel Oyong, Enefiok Etuk, Imeh Umoren, Ubong Etuk, Aloysius Akpanobong, Anietie Ekong, Saviour Inyang, Chikodili Orazulume, A fuzzy-optimized multi-level random forest (FOMRF) model for the classification of the impact of technostress , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 3, August 2025
- Nahid Salma, Majid Khan Majahar Ali, Raja Aqib Shamim, Machine learning-based feature selection for ultra-high-dimensional survival data: a computational approach , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 3, August 2025
- Shaymaa Mohammed Ahmed, Majid Khan Majahar Ali, Arshad Hameed Hasan, Evaluating feature selection methods in a hybrid Weibull Freund-Cox proportional hazards model for renal cell carcinoma , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 3, August 2025
- Idongesit E. Eteng, Udeze L. Chinedu, Ayei E. Ibor, A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 1, February 2025
- Paavithashnee Ravi Kumar, Majid Khan Majahar Ali, Olayemi Joshua Ibidoja, Identifying heterogeneity for increasing the prediction accuracy of machine learning models , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 3, August 2024
- Samuel Olorunfemi Adams, Davies Abiodun Obaromi, Alumbugu Auta Irinews, Goodness of Fit Test of an Autocorrelated Time Series Cubic Smoothing Spline Model , Journal of the Nigerian Society of Physical Sciences: Volume 3, Issue 3, August 2021
- Nour Hamad Abu Afouna, Majid Khan Majahar Ali, Optimizing precision farming: enhancing machine learning efficiency with robust regression techniques in high-dimensional data , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 1, February 2025
- Gerard Shu Fuhnwi, Janet O. Agbaje, Kayode Oshinubi, Olumuyiwa James Peter, An Empirical Study on Anomaly Detection Using Density-based and Representative-based Clustering Algorithms , Journal of the Nigerian Society of Physical Sciences: Volume 5, Issue 2, May 2023
- xiaojie zhou, Majid Khan Majahar Ali, Farah Aini Abdullah, Lili Wu, Ying Tian, Tao Li, Kaihui Li, Implementing a dung beetle optimization algorithm enhanced with multi-strategy fusion techniques , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 2, May 2025
- Sunday Samuel Bako, Norhaslinda Ali, Jayanthi Arasan, Assessing model selection techniques for distributions use in hydrological extremes in the presence of trimming and subsampling , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 4, November 2024
You may also start an advanced similarity search for this article.

