Optimizing precision farming: enhancing machine learning efficiency with robust regression techniques in high-dimensional data

Authors

  • Nour Hamad Abu Afouna School of Mathematical Sciences, Universiti Sains Malaysia 11800 USM, Penang, Malaysia
  • Majid Khan Majahar Ali School of Mathematical Sciences, Universiti Sains Malaysia 11800 USM, Penang, Malaysia

Keywords:

lasso, Ridge, M-estimation, MM-estimation, Robust Regression

Abstract

Smart precision farming leverages IoT, cloud computing, and big data to optimize agricultural productivity, lower costs, and promote sustainability through digitalization and intelligent methodologies. However, it faces challenges such as managing complex variables, addressing multicollinearity, handling outliers, ensuring model robustness, and enhancing accuracy, particularly with small to medium-sized datasets. To overcome these obstacles, reducing retraining time and resolving the complexity issue is essential for improving the machine learning algorithm’s performance, scalability, and efficiency, especially when dealing with large or high-dimensional datasets. In a recent study involving 435 drying parameters and 1,914 observations, two machine learning algorithms - Ridge and Lasso - were employed to analyze and compare the impact of two variable selection techniques, specifically the regularization methods Ridge and Lasso, before and after addressing heterogeneity in highly ranked variables (50, 100, 150, 200, 250, 300). Additionally, robust regression methods such as S, M, MM, M-Hampel, M-Huber, M-Tukey, MM-bisquare, MM-Hampel, and MM-Huber were applied. The results demonstrated that the robust methods, when applied to Ridge and Lasso, achieved the highest efficiency, with the smallest values for MAPE, MSE, SSE, and the highest R2 values, both before and after accounting for heterogeneity. As a result of the study, the best models are the Ridge model with the MM bisquares before heterogeneity, the Ridge model with the MM method after heterogeneity, and the Lasso model with the MM method before heterogeneity and the Lasso model with MM Hampel after heterogeneity.

Dimensions

S. Ghosh & R. Dasgupta, “Machine learning and precision farming”, Machine Learning in Biological Sciences, R. Dasgupta, Springer, Singapore, 2022, pp. 239–249. https://doi.org/10.1007/978-981-16-8881-2_28.

U. M. Durdag, “Minimum-variance-based outlier detection method using? forward-search model error in geodetic networks”, Geosci. Model Dev 17 (2024) 2187. https://doi.org/10.5194/gmd-17-2187-2024.

S. Mahanto, R. Chattopadhyay, S. Kundu & S. Kanthal, “Precision farming: innovations, techniques and sustainability”, International Journal of Agriculture Extension and Social Development 7 (2024) 42. https://doi.org/10.33545/26180723.2024.v7.i4a.513.

M. Mukhtar, M. K. B. M. Ali, A. Javaid, M. T. Ismail & A. Fudholi, “Accurate and hybrid regularization-robust regression model in handling multicollinearity and outlier using 8sc for big data”, Mathematical Modelling of Engineering Problems 8 (2021) 547. https://doi.org/10.18280/mmep.080407.

M. Mukhtar, M. K. M. Ali, M. T. Ismail, F. M. Hamundu, Alimuddin, N. Akhtar & A. Fudholi, “Hybrid model in machine learning–robust regression applied for sustainability in agriculture and food security”, International Journal of Electrical and Computer Engineering (IJECE) 12 (2022) 4457. https://doi.org/10.11591/ijece.v12i4.pp4457-4468.

O. J. Ibidoja, F. P. Shan, J. Sulaiman & M. K. M. Ali, “Detecting heterogeneity parameters and hybrid models for precision farming”, Journal of Big Data 10 (2023) 130. https://doi.org/10.1186/s40537-023-00810-8.

O. J. Ibidoja, F. P. Shan, M. Mukhtar, J. Sulaiman & M. K. M. Ali, “Robust m-estimators and machine learning algorithms for improving the predictive accuracy of seaweed contaminated big data”, Journal of the Nigerian Society of Physical Sciences 5 (2023) 1137. https://doi.org/10.46481/jnsps.2022.1137.

W. H. Nugroho, N. W. S. Wardhani, A. A. R. Fernandes & Solimun, “Robust regression analysis study for data with outliers at some significance levels”, Mathematics and Statistics 8 (2020) 373. https://doi.org/10.13189/ms.2020.080401.

R. R. Wilcox, “Robust Regression”, in Introduction to Robust Estimation and Hypothesis Testing, Eds. R. R. Wilcox, Elsevier, Los Angeles, California, 2022, pp. 577–651. https://doi.org/10.1016/b978-0-12-820098-8.00016-6.

Y. Sorek and K. Todros, “Robust regression analysis based on the kdivergence”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, , Republic of Korea, 2024, pp. 9511– 9515. https://doi.org/10.1109/ICASSP48485.2024.10447931.

D. A. Rahayu, U. F. Nursholihah,G. Suryaputra & S. Surono, “Comparison of the M, MM and S estimator in robust regression analysis on indonesian literacy index data 2018”, Journal of Sciences and Data Analysis 4 11 (2023) https://doi.org/10.20885/EKSAKTA.vol4.iss1.art2.

T. Qureshi, M. Saeed, K. Ahsan, A. A. Malik, E. S. Muhammad & N. Touheed, “Smart agriculture for sustainable food security using internet of things (IoT)”, Wireless Communications and Mobile Computing 6 (2022) 608394. https://doi.org/10.1155/2022/9608394.

A. Ikram, W. Aslam, R. Aziz, F. Noor, G. Mallah, S. Ikram, M. A. Saeed, A. Abdullah & I. Ullah, “Crop yield maximization using an iot-based smart decision”, Journal of Sensors 2022 (2022) 1. https://doi.org/10.1155/2022/2022923.

L. Rabhi, N. Falih, L. Afraites & B. Bouikhalene, “A functional framework based on big data analytics for smart farming”, Indonesian Journal of Electrical Engineering and Computer Science 24 (2023) 1772. https://doi.org/10.11591/ijeecs.v24.i3.pp1772-1779.

P. R. Kumar and M. K. M. Ali and O. J. Ibidoja, “Identifying heterogeneity for increasing the prediction accuracy of machine learning models”, Journal of the Nigerian Society of Physical Sciences 6 (2024) 2058. https://doi.org/10.46481/jnsps.2024.2058.

S. Prasad, “Regression”, in Advanced Statistical Methods, Eds. S. Prasad, Springer, Singapore, 2024, pp. 1–45. https://doi.org/10.1007/978-981-99-7257-9.

D. C. Montgomery, E. A. Peck & G. G. Vining, “Introduction to linear regression analysis”, John Wiley & Sons, Inc., New York, United States, 2021, pp. 71–78. https://content.e-bookshelf.de/media/reading/L-16125104-1a3a7c5bd1.pdf.

A. Zulkarnain, S. W. Rizki & H. Perdana, “Analisis regresi robust estimasi-MM dalam mengatasi pencilan pada regresi linear berganda”, Bimaster: Buletin Ilmiah Matematika, Statistika Dan Terapannya 9 (2020) 123. http://doi.org/10.26418/bbimst.v9i1.38666.

N. E. Jeremia, S. Nurrohmah & I. Fithriani, “Robust Ridge regression to solve multicollinearity and outlier”, Journal of Physics: Conference Series 1442 (2020) 012030. 1http://doi.org/0.1088/1742-6596/1442/1/012030.

M. N. A. Singgih & A. Fauzan, “Comparison of M estimation, S estimation, with MM estimation to get the best estimation of robust regression in criminal cases in Indonesia”, Jurnal Matematika, Statistika Dan Komputasi 18 (2022) 251. https://doi.org/10.20956/j.v18i2.18630.

M. Mukhtar, M. K. M. Ali, M. T. Ismail, F. M. Hamundu, Alimuddin, N. Akhtar & A. Fudholi, “Hybrid model in machine learning–robust regression applied for sustainability in agriculture and food security”, International Journal of Electrical and Computer Engineering 12 (2022) 4457. https://doi.org/10.11591/ijece.v12i4.pp4457-4468.

C. Lim, P. K. Sen & S. D. Peddada, “Robust nonlinear regression in applications”, Journal of the Indian Society of Agricultural Statistics 67 (2013) 215. https://pubmed.ncbi.nlm.nih.gov/25580021/.

R. Finger & W. Hediger, “The application of robust regression to a production function comparison - the example of swiss corn”, IED Working Paper 2 (2009) 1. http://dx.doi.org/10.2139/ssrn.1430342.

P. Hasih, Y. Susanti & S. S. Handajani, “A robust regression by using huber estimator and tukey bisquare estimator for predicting availability of corn in karanganyar regency, indonesia”, Indonesian Journal of Applied Statistics 1 (2018) 398. https://doi.org/10.13057/IJAS.V1I1.24090.

F. Adewale, L. Olatunji & K. Ayinde, “Some robust ridge regression for handling multicollinearity and outlier”, International Journal of Sciences: Basic and Applied Research (IJSBAR) 16 (2014) 192. https://www.researchgate.net/publication/313724168_Some_Robust_Ridge_Regression_for_handling_Multicollinearity_and_Outlier.

S. Peng, G. Tarr, S. Muller & S. Wang, “CR-Lasso: Robust cellwise regu-¨ larized sparse regression”, Computational Statistics & Data Analysis, 197 (2024) 107971 https://doi.org/10.1016/j.csda.2024.107971.

M. Xu, “Sales prediction based on lasso regression”, Highlights in Science, Engineering and Technology 88 (2024) 343. https://doi.org/10.54097/p9hyrk70.

A. Khanna, F. Lu & E. Raff, “Sparse private lasso logistic regression”, arXiv (2023) https://doi.org/10.48550/arXiv.2304.12429.

Y. Susanti, H. Pratiwi, S. Sulistijowati & T. Liana, “M Estimation, S estimation, and MM estimation in robust regression”, International Journal of Pure and Applied Mathematics 91 (2014) 349. http://dx.doi.org/10.12732/ijpam.v91i3.7.

E. M. Almetwally & H. Mohamed and A. Almongy, “Comparison between M-estimation, S-estimation, and MM estimation methods of robust estimation with application and simulation”, International Journal of Mathematical Archive 9 (2018) 55. https://www.researchgate.net/publication/328335899.

P. Rousseeuw & V. J. Yohai, “Robust regression by means of s estimators” in Robust and Nonlinear Time Series Analysis, Eds. J. Franke and W. Hardle and D. Martin, Springer, New York, 1984, pp. 256–274. https://doi.org/10.1007/978-1-4615-7821-5_15

P. Exterkate, P. J. F. Groenen, C. Heij & D. van Dijk, “Nonlinear forecasting with many predictors using kernel ridge regression”, International Journal of Forecasting 32 (2016) 736. https://doi.org/10.1016/j.ijforecast.2015.11.017.

J. Rougier, “Ensemble averaging and mean squared error”, Journal of Climate 29 (2016) 8865. https://doi.org/10.1175/JCLI-D-16-0012.1.

J. Padrul, R. Dedi, D. Epha & S. Supandi, “Comparison of robust estimation on multiple regression model”, Journal of Mathematics and Its Applications 17 (2013) 0979. https://doi.org/10.30598/barekengvol17iss2pp0979-0988.

C. Tirink & H. Onder, “Comparison of M, MM and LTS estimators in¨ linear regression in the presence of outlier”, Turkish Journal of Veterinary & Animal Sciences 46 (2022) 420. https://doi.org/10.55730/1300-0128.4212.

A. Tatl?yer, “The effects of raising type on performances of some data mining algorithms in lambs”, Journal of Agriculture and Nature 23 (2020) 772. https://doi.org/10.18016/ksutarimdoga.vi.651232.

D. M. Khan, M. Ali, Z. Ahmad, S. Manzoor & S. Hussain, “A new efficient redescending m-estimator for robust fitting of linear regression models in the presence of outliers”, Mathematical Problems in Engineering 2023 (2023) 1. https://doi.org/10.1155/2021/3090537.

O. J. Ibidoja, F. P. Shan & M. K. M. Ali, “Modified sparse regression to solve heterogeneity and hybrid models for increasing the prediction accuracy of seaweed big data with outliers”, Scientific Reports 14 (2024) 17599. https://doi.org/10.1038/s41598-024-60612-7.

Published

2025-02-01

How to Cite

Optimizing precision farming: enhancing machine learning efficiency with robust regression techniques in high-dimensional data. (2025). Journal of the Nigerian Society of Physical Sciences, 7(1), 2314. https://doi.org/10.46481/jnsps.2025.2314

How to Cite

Optimizing precision farming: enhancing machine learning efficiency with robust regression techniques in high-dimensional data. (2025). Journal of the Nigerian Society of Physical Sciences, 7(1), 2314. https://doi.org/10.46481/jnsps.2025.2314