Robust hybrid algorithms for regularization and variable selection in QSAR studies
Keywords:
High dimension, QSAR, Multicollinearity, Outliers, Sparse Least trimmed squares, Random forestAbstract
This study introduces a robust hybrid sparse learning approach for regularization and variable selection. This approach comprises two distinct steps. In the initial step, we segment the original dataset into separate training and test sets and standardize the training data using its mean and standard deviation. We then employ either the LASSO or sparse LTS algorithm to analyze the training set, facilitating the selection of variables with non-zero coefficients as essential features for the new dataset. Secondly, the new dataset is divided into training and test sets. The training set is further divided into k folds and evaluated using a combination of Random Forest, Ridge, Lasso, and Support Vector Regression machine learning algorithms. We introduce novel hybrid methods and juxtapose their performance against existing techniques. To validate the efficacy of our proposed methods, we conduct a comprehensive simulation study and apply them to a real-life QSAR analysis. The findings unequivocally demonstrate the superior performance of our proposed estimator, with particular distinction accorded to SLTS+LASSO. In summary, the twostep robust hybrid sparse learning approach offers an effective regularization and variable selection applicable to a wide spectrum of real-world problems.
Published
How to Cite
Issue
Section
Copyright (c) 2023 Adewale F. Lukman, Christian N. Nwaeme

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Similar Articles
- O. J. Ibidoja, F. P. Shan, Mukhtar, J. Sulaiman, M. K. M. Ali, Robust M-estimators and Machine Learning Algorithms for Improving the Predictive Accuracy of Seaweed Contaminated Big Data , Journal of the Nigerian Society of Physical Sciences: Volume 5, Issue 1, February 2023
- O. G. Obadina, Adedayo Funmi Adedotuun, O. A. Odusanya, Ridge Estimation's Effectiveness for Multiple Linear Regression with Multicollinearity: An Investigation Using Monte-Carlo Simulations , Journal of the Nigerian Society of Physical Sciences: Volume 3, Issue 4, November 2021
- Osowomuabe Njama-Abang, Denis U. Ashishie, Paul T. Bukie, Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 3, August 2025
- Atiek Iriany, Wigbertus Ngabu, Henny Pramoedyo, Amarifai, Geographically weighted regression random forest for modeling soil particles , Journal of the Nigerian Society of Physical Sciences: Volume 8, Issue 2, May 2026 (In Progress)
- Gabriel James, Ifeoma Ohaeri, David Egete, John Odey, Samuel Oyong, Enefiok Etuk, Imeh Umoren, Ubong Etuk, Aloysius Akpanobong, Anietie Ekong, Saviour Inyang, Chikodili Orazulume, A fuzzy-optimized multi-level random forest (FOMRF) model for the classification of the impact of technostress , Journal of the Nigerian Society of Physical Sciences: Volume 7, Issue 3, August 2025
- Paavithashnee Ravi Kumar, Majid Khan Majahar Ali, Olayemi Joshua Ibidoja, Identifying heterogeneity for increasing the prediction accuracy of machine learning models , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 3, August 2024
- Chinedu L. Udeze, Idongesit E. Eteng, Ayei E. Ibor, Application of Machine Learning and Resampling Techniques to Credit Card Fraud Detection , Journal of the Nigerian Society of Physical Sciences: Volume 4, Issue 3, August 2022
- A. B Yusuf, R. M Dima, S. K Aina, Optimized Breast Cancer Classification using Feature Selection and Outliers Detection , Journal of the Nigerian Society of Physical Sciences: Volume 3, Issue 4, November 2021
- Sherifdeen O. Bolarinwa, Eli Danladi, Andrew Ichoja, Muhammad Y. Onimisia, Christopher U. Achem, Synergistic Study of Reduced Graphene Oxide as Interfacial Buffer Layer in HTL-free Perovskite Solar Cells with Carbon Electrode , Journal of the Nigerian Society of Physical Sciences: Volume 4, Issue 3, August 2022
- Gabriel James, Ime Umoren, Anietie Ekong, Saviour Inyang, Oscar Aloysius, Analysis of support vector machine and random forest models for classification of the impact of technostress in covid and post-covid era , Journal of the Nigerian Society of Physical Sciences: Volume 6, Issue 3, August 2024
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Segun L. Jegede, Adewale F. Lukman, Kayode Ayinde, Kehinde A. Odeniyi, Jackknife Kibria-Lukman M-Estimator: Simulation and Application , Journal of the Nigerian Society of Physical Sciences: Volume 4, Issue 2, May 2022

