Regularization Effects in Deep Learning Architecture


  • Muhammad Dahiru Liman Department of Computer Science, Federal University of Lafia, Nasarawa, Nigeria
  • Salamatu Ibrahim Osanga Department of Computer Science, Federal University of Lafia, Nasarawa, Nigeria
  • Esther Samuel Alu Department of Computer Science, Nasarawa State University Keffi, Nasarawa, Nigeria
  • Sa'adu Zakariya Department of Computer Science, Federal University of Lafia, Nasarawa, Nigeria


Deep learning, Regularization, Overfitting, Size, Epoch, Dropout, Weight Decay, Augmentation


This research examines the impact of three widely utilized regularization approaches -- data augmentation, weight decay, and dropout --on mitigating overfitting, as well as various amalgamations of these methods. Employing a Convolutional Neural Network (CNN), the study assesses the performance of these strategies using two distinct datasets: a flower dataset and the CIFAR-10 dataset. The findings reveal that dropout outperforms weight decay and augmentation on both datasets. Additionally, a hybrid of dropout and augmentation surpasses other method combinations in effectiveness. Significantly, integrating weight decay with dropout and augmentation yields the best performance among all tested method blends. Analyses were conducted in relation to dataset size and convergence time (measured in epochs). Dropout consistently showed superior performance across all dataset sizes, while the combination of dropout and augmentation was the most effective across all sizes, and the triad of weight decay, dropout, and augmentation excelled over other combinations. The epoch-based analysis indicated that the effectiveness of certain techniques scaled with dataset size, with varying results.


G. E. Hinton, S. Osindero, & Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006.

Y. Lecun, L. Bottou, Y. Bengio, & P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE 86 (1998) 2278.

A. Graves, Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850. (2013).

Y. Bengio, P. Simard, & P. Frasconi, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks 5 (1994) 157.

R. Moradi, R. Berangi, & B. Minaei, “A survey of regularization strategies for Deep Models”, Artificial Intelligence Review 53 (2019) 394.

M. Nielsen, Neural Networks And Deep Learning, Determination Press, San Francisco, CA, USA, 2015, pp. 15 – 24.

P. Y. Simard, D. Steinkraus, & J. C. Platt, Best practices for convolutional neural networks applied to visual document analysis, Seventh International Conference on Document Analysis and Recog nition, Edinburg, United Kingdom, 2003. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis.

A. Krizhevsky, I. Sutskever, & G. E. Hinton, “ImageNet classification with deep convolutional Neural Networks”, Communications of the ACM 60 (2017) 84.

G. Hinton, S. Nitish, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors”, arXiv preprint (2012). 228102719 Improving neural networks by preventing co-adaptation of feature detectors.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, C. Aaron & Y. Bengio, “Generative adversarial nets”, in Advances in Neural Information Processing Systems, 2014, pp. 2672-2680.

L. Prechelt, “Early stopping — but when?”, in Lecture Notes in Computer Science, Germany, 2012, pp. 53–67. 5.

Y. A. LeCun, L. Bottou, G. B. Orr & K. R. Muller, “Efficient backprop,” Lecture Notes in Computer Science, Springer Verlag, 2012, pp. 9–48. 3.

L. Breiman, “Bagging predictors”, Machine Learning 24 (1996) 123.

S. Ioffe, & C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015, pp. 448–456.

A. Krizhevsky, I. Sutskever, & G.E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”, Advances in Neural Information Processing Systems (NIPS) 25 (2012) 1097.

R. Tibshirani, “Regression shrinkage and selection via the lasso”, Journal of the Royal Statistical Society: Series B (Methodological) 58 (1996) 267.

T. Hastie, J. Friedman, & R. Tisbshirani, The Elements of Statis tical Learning: Data Mining, Inference, and Prediction, Springer, New York, 2017, pp. 1–758.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, & R. Salakhut dinov, “Dropout: a simple way to prevent neural networks from overfitting”, The Journal Of Machine Learning Research, 15 (2014) 1929. medium=social&utm campaign=buffer,.

H. Peng, L. Mou, G. Li, Y. Chen, Y. Lu & Z. Jin, A comparative study on regularization strategies for embedding- based Neural Networks, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, (2015).

F. Kamalov & H. H. Leung, Deep learning regularization in imbalanced data, 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), United Arab Emirates, 2020.

I. Marin, A. Kuzmanic Skelin, & T. Grujic, “Empirical evaluation of the effect of optimization and regularization techniques on the generalization performance of deep convolutional Neural Network,” Applied Sciences 10 (2020) 7817.

M. D. Zeiler & R. Fergus, Visualizing and understanding Convolutional Networks, Computer Vision - ECCV 2014: 13th European Conference and proceedings, Zurich, Switzerland, 2014, pp. 818–833. 53.

C. M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2016.

Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, Y. Bengio, Identifying and attacking the saddle point problem in high- dimensional non-convex optimization, Proceedings of the 27th International Conference on Neural Information Processing Systems, United States, 2014, pp.


I. Goodfellow, Y. Bengio, & A. Courville, Deep Learning, MA: MIT Press Ltd, Cambridge, 2017.

E. M. Raouhi, M. Lachgar, &A. Kartit, Comparative study of regression and regularization methods: Application to weather and Climate Data, Proceedings of the 6th International Conference on Wireless Technologies, Springer Singapore, 2021, pp. 233–240. 22.

W. Swastika, R. B. Widodo, G. A. Balqis, & R. Sitepu, The effect of regularization on deep learning methods for detection of malaria infection, 2021 International Conference on Converging Technology in Electrical and Information Engineering (ICCTEIE),Bandar Lampung, Indonesia, 2021.

E. Bakshy, L. Dworkin, B. Karrer, K. Kashin, B. Letham, A. Murthy, S. Singh, AE: A domain-agnostic platform for adaptive experimentation, In 32nd Conference on Neural Information Processing Systems, Montreal, 2018, pp. 1-8. workshop.pdf.

J. Mockus, V. Tiesis, A. Zilinskas, “The application of Bayesian methods for seeking the extremum”, Towards Global Optimisation 2 (1978) 117. The application of Bayesian methods for seeking the extremum.

Y. Tian & Y. Zhang, “A comprehensive survey on regularization strategies in Machine Learning”, Information Fusion 80 (2022) 146.

CNN Model Architecture.



How to Cite

Regularization Effects in Deep Learning Architecture. (2024). Journal of the Nigerian Society of Physical Sciences, 6(2), 1911.



Computer Science

How to Cite

Regularization Effects in Deep Learning Architecture. (2024). Journal of the Nigerian Society of Physical Sciences, 6(2), 1911.