Ensemble feature selection using weighted concatenated voting for text classification

Authors

  • Oluwaseun IGE School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Pulau Pinang, Malaysia | Universal Basic Education Commission, Wuse Zone 4, Abuja, 900284, Nigeria.
  • Keng Hoon Gan School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Pulau Pinang, Malaysia

Keywords:

Feature Selection, Text Classification, Dimensionality Reduction, Univariate Filter Methods

Abstract

Following the increasing number of high dimensional data, selecting relevant features has always been better handled by filter feature selection techniques due to its improved generalization, faster training time, dimensionality reduction, less prone to overfitting, and improved model performance. However, the most used feature selection methods are unstable; a feature selection method chooses different subsets of characteristics that produce different classification accuracy. Selecting an appropriate hybrid harnesses the local feature relevant to the discriminative power of filter methods for improved text classification, which is lacking in past literature. In this paper, we proposed a novel multi-univariate hybrid feature selection method (MUNIFES) for enhanced discriminative power between the features and the target class. The proposed method utilizes multi-iterative processes to select the best feature sets from each univariate feature selection method. MUNIFES has employed the ensemble of multi-filter discriminative strength of Chi-Square (Chi2), Analysis of Variance (ANOVA), and Infogain methods to select optimal feature subsets. To evaluate the success of the proposed method, several experiments were performed on the 20newsgroup dataset and its variant (17newsgroup) with 10 classifiers (including ensemble, classification and optimization algorithms, and Artificial Neural Network (ANN)), compared with the state-of-the-art feature selection methods. The MUNIFES results indicated a better accuracy classification performance.

Dimensions

Z. Chu, J. He, X. Zhang, X. Zhang & N. Zhu, “Differential privacy highdimensional data publishing based on feature selection and clustering”, Electronics 12 (2023) 1959. https://doi.org/10.3390/electronics12091959.

J. Wan, H. Chen, Z. Yuan, T. Li, X. Yang & B. Bin Sang, “A novel hybrid feature selection method considering feature interaction in neighborhood rough set”, Knowledge-Based Systems 227 (2021) 107167. https://doi.org/10.1016/j.knosys.2021.107167.

R. Cekik & A. K. Uysal, “A novel filter feature selection method using rough set for short text data”, Expert Systems with Applications 160 (2020) 113691. https://doi.org/10.1016/j.eswa.2020.113691.

G. Taskin, H. Kaya & L. Bruzzone, “Feature selection based on high dimensional model representation for hyperspectral images”, IEEE Transactions on Image Processing 26 (2017) 29182928. https://doi.org/10.1109/TIP.2017.2687128.

S. Abasabadi, H. Nematzadeh, H. Motameni & E. Akbari,“Automatic ensemble feature selection using fast non-dominated sorting”, Information Systems 100 (2021) 101760. https://doi.org/10.1016/j.is.2021.101760.

N. Hoque, M. Singh & D. K. Bhattacharyya, “EFS-MI: an ensemble feature selection method for classification”, Complex & Intelligent Systems 4 (2018) 105. https://doi.org/10.1007/s40747-017-0060-x.

N. Gopika & A. M. Kowshalaya, “Correlation-Based Feature Selection Algorithm for Machine Learning”, IEEE, Coimbatore, India, 2018, pp. 692-695. https://doi.org/10.1109/cesys.2018.8723980.

A. K. Uysal & S. Gunal,“A novel probabilistic feature selection method for text classification”, Knowledge-Based Systems 36 (2012) 226. https://doi.org/10.1016/j.knosys.2012.06.005.

J. Yang, Y. Liu, X. Zhu, Z. Liu & X. Zhang, “A new feature selection based on comprehensive measurement both in inter-category and intracategory for text categorization”, Information Processing & Management 48 (2012) 741. https://doi.org/10.1016/j.ipm.2011.12.005.

W. Zong, F. Wu, L. K. Chu & D. Sculli,“A discriminative and semantic feature selection method for text categorization”, International Journal of Production Economics 165 (2015) 215. https://doi.org/10.1016/j.ijpe.2014.12.035.

A. Rehman, K. Javed, H. A. Babri & M. Saeed, “Relative discrimination criterion - A novel feature ranking method for text data”, Expert Systems with Applications 42 (2015) 3670. https://doi.org/10.1016/j.eswa.2014. 12.013.

N. S. Mohamed, S. Zainudin & Z. Ali Othman, “Metaheuristic approach for an enhanced mRMR filter method for classification using drug response microarray data”, Expert Systems with Applications 90 (2017) 224. https://doi.org/10.1016/j.eswa.2017.08.026.

K. Javed & H. A. Babri, “Feature selection based on a normalized difference measure for text classification”, Information Processing & Management 53 (2017) 473. https://doi.org/10.1016/j.ipm.2016.12.004.

A. Rehman, K. Javed, H. A. Babri & N. Asim,“Selection of the most relevant terms based on a max-min ratio metric for text classification”, Expert Systems with Applications 114 (2018) 78. https://doi.org/10.1016/ j.eswa.2018.07.028.

K. Kim & S. Y. Zzang, “Trigonometric comparison measure: A feature selection method for text categorization”, Data & Knowledge Engineering 119 (2018) 1. https://doi.org/10.1016/j.datak.2018.10.003.

Z. Manbari, F. AkhlaghianTab & C. Salavati,“Hybrid fast unsupervised feature selection for high-dimensional data”, Expert Systems with Applications 124 (2019) 97. https://doi.org/10.1016/j.eswa.2019.01.016.

E. Borandag, A. Ozc¸ift & Y. Kaygusuz, “Development of majority vote¨ ensemble feature selection algorithm augmented with rank allocation to enhance Turkish text categorization”, Turkish Journal of Electrical Engineering and Computer Sciences 29 (2021) 514. https://doi.org/10.3906/ ELK-1911-116.

P. Drotar, M. Gazda & L. Vokorokos, “Ensemble feature selection using´ election methods and ranker clustering”, Information Sciences 480 (2019) 365. https://doi.org/10.1016/j.ins.2018.12.033.

A. Hashemi, M. B. Dowlatshahi & H. Nezamabadi-pour,“Ensemble of feature selection algorithms: a multi-criteria decision-making approach”, International Journal of Machine Learning and Cybernetics 13 (2022) 49. https://doi.org/10.1007/s13042-021-01347-z.

G. Fu, B. Li, Y. Yang & C. Li, “Re-ranking and TOPSIS-based ensemble feature selection with multi-stage aggregation for text categorization”, Pattern Recognition Letters 168 (2023) 47. https://doi.org/10.1016/j.patrec.2023.02.027.

T. Sabbah, A. Selamat, M. H. Selamat, R. Ibrahim, & H. Fujita,“Hybridized term-weighting method for Dark Web classification”, Neurocomputing 173 (2016) 1908. https://doi.org/10.1016/j.neucom.2015.09.063.

B. Seijo-Pardo, I. Porto-D´az, V. Bolon-Canedo & A. AlonsoBetanzos,“Ensemble feature selection: Homogeneous and heterogeneous approaches”, Knowledge-Based Systems 118 (2017) 124. https://doi.org/10.1016/j.knosys.2016.11.017.

D. S. Guru, M. Suhil, L. N. Raju & N. V. Kumar, “An alternative framework for univariate filter based feature selection for text categorization”, Pattern Recognition Letters 103 (2018) 23. https://doi.org/10.1016/ j.patrec.2017.12.025.

A. Hashemi, M. Bagher Dowlatshahi & H. Nezamabadi-pour,“A paretobased ensemble of feature selection algorithms”, Expert Systems with Applications 180 (2021) 115130. https://doi.org/10.1016/j.eswa.2021.115130.

L. da F. Costa,“An Introduction to Multisets”, Sao Carlos Institute of Physics, Oct. 2021. [Online]. https://doi.org/10.48550/arXiv.2110.12902.

O. J. Ibidoja, F.P. Shan, Mukhtar, J. Sulaiman & M. K. M. Ali, “Robust M-estimators and Machine Learning Algorithms for Improving the Predictive Accuracy of Seaweed Contaminated Big Data”, Journal of the Nigerian Society of Physical Sciences 5 (2023) 1137. https://doi.org/10.46481/jnsps.2023.1137.

S. A. Abdulraheem, S. Aliyu & F. B. Abdullahi,“Hyper-parameter tuning for support vector machine using an improved cat swarm optimization algorithm”, Journal of the Nigerian Society of Physical Sciences 5 (2023) 1007. https://doi.org/10.46481/jnsps.2023.1007.

Published

2024-03-06

How to Cite

Ensemble feature selection using weighted concatenated voting for text classification. (2024). Journal of the Nigerian Society of Physical Sciences, 6(1), 1823. https://doi.org/10.46481/jnsps.2024.1823

Issue

Section

Computer Science

How to Cite

Ensemble feature selection using weighted concatenated voting for text classification. (2024). Journal of the Nigerian Society of Physical Sciences, 6(1), 1823. https://doi.org/10.46481/jnsps.2024.1823