Effective tweets classification for disaster crisis based on ensemble of classifiers

Authors

  • Christopher Ifeanyi Eke Department of Computer Science, Faculty of Computing, Federal University of Lafia, P.M.B 146, Lafia, Nasarawa State, Nigeria
  • Kholoud Maswadi Department of Management Information Systems, Jazan University, Jazan 45142, Saudi Arabia
  • Musa Phiri School of Engineering and Technology, Mulungushi University, PO Box 80415, Kabwe, Zambia
  • Mulenga Mwege School of Engineering and Technology, Mulungushi University, PO Box 80415, Kabwe, Zambia
  • Mohammad Imran Department of Information Technology, Balochistan University of Information Technology, Engineering and Management Sciences, Airport Road, Baleli, Quetta, Pakistan
  • Dekera Kenneth Kwaghtyo Department of Computer Science, Faculty of Computing, Federal University of Lafia, P.M.B 146, Lafia, Nasarawa State, Nigeria
  • Akeremale Olusola Collins Department of Mathematics, Faculty of Science, Federal University of Lafia, P.M.B 146, Lafia, Nasarawa State, Nigeria

Keywords:

Disaster Crisis Management, Social Media Analytics, Twitter, Machine Learning Classifiers, Ensemble Methods, Feature Extraction

Abstract

In the field of disaster management, social media analytics has gained significant recognition. Social media platforms, particularly Twitter, have become an invaluable source for disseminating information during disasters, offering real-time updates on events, crisis reports, and casualty information. However, the deluge of information on social media can also be overwhelming, with a substantial amount of irrelevant content. To address this challenge, researchers leverage machine learning (ML) classifiers to automatically categorize disaster-related tweets. However, ML classifiers, while being effective, also face issues such as overfitting and class imbalance. This study proposes an ensemble-based approach that integrates a variety of linguistic and word embedding features, including Parts-Of-Speech (POS), hashtags, Term Frequency-Inverse Document Frequency (TF-IDF), GloVe, Word2Vec, and BERT. A range of supervised learning algorithms like Decision Trees, Logistic Regression, Support Vector Machines, and Random Forests, were evaluated individually and as part of ensemble methods like AdaBoost, Bagging, and Random Subspace. The results show that combining TF-IDF with word embeddings and using the AdaBoost ensemble model yields superior performance, achieving a classification accuracy of 98.92%. This represents a notable improvement over the conventional standalone classifiers and highlights the advantage of ensemble methods in enhancing model robustness and minimizing overfitting. The proposed approach demonstrates not only high predictive capacity but also scalability for real-time tweet filtering during emergencies. In addition to demonstrating the efficacy of ensemble methods in disaster tweet classification, this study also provides valuable insights for improving social media-based crisis response. It also establishes a foundation for future research, particularly in multi-lingual and multi-disaster scenarios.

Dimensions

[1] J. B. Houston, G. Hawthorne, M. F. Perreault, E. H. Park, M. Goldstein Hode, M. R. Halliwell, S. E. Turner McGowen, R. Davis, S. Vaid, J. A. McElderry & S. A. Griffith, “Social media and disasters: a functional framework for social media use in disaster planning, response, and research”, Disasters 39 (2015) 1. https://doi.org/10.1111/disa.12092.

[2] L. Xukun & D. Caragea, “Improving disaster-related tweet classification with a multimodal approach”, in ISCRAM 2020 Conference Proceedings–17th International Conference on Information Systems for Crisis Response and Management, 2020. https://par.nsf.gov/servlets/purl/10204504.

[3] H. Li, D. Caragea, C. Caragea & N. Herndon, “Disaster response aided by tweet classification with a domain adaptation approach”, Journal of Contingencies and Crisis Management 26 (2018) 16. https://doi.org/10.1111/1468-5973.12194.

[4] W. Zhai, “A multi-level analytic framework for disaster situational awareness using Twitter data”, Computational Urban Science 2 (2022) 23. https://doi.org/10.1007/s43762-022-00052-z.

[5] K. Maswadi, A. Alhazmi, F. Alshanketi & C. I. Eke, “The empirical study of tweet classification system for disaster response using shallow and deep learning models”, Journal of Ambient Intelligence and Humanized Computing 15 (2024) 3303. https://doi.org/10.1007/s12652-024-04807-w.

[6] A. Kumar & J. P. Singh, “Location reference identification from tweets during emergencies: A deep learning approach”, International Journal of Disaster Risk Reduction 33 (2019) 365. https://arxiv.org/pdf/1901.08241.

[7] J. A. de Bruijn, H. C. Winsemius, M. J. Wanders, E. J. M. van den Berg & H. H. G. Savenije, “Improving the classification of flood tweets with contextual hydrological information in a multimodal neural network”, Computers & Geosciences 140 (2020) 104485. https://doi.org/10.1016/j.cageo.2020.104485.

[8] W. Gata, F. Amsury, N. K. Wardhani, I. Sugiyarto, D. N. Sulistyowati & I. Saputra, “Informative tweet classification of the earthquake disaster situation in Indonesia”, in 2019 5th International Conference on Computing Engineering and Design (ICCED), 2019, pp. 1–6. http://dx.doi.org/10.1109/ICCED46541.2019.9161135.

[9] R. Thomson, N. Ito, H. Suda, F. Lin, Y. Liu, R. Hayasaka, R. Isochi & Z. Wang, “Trusting tweets: the Fukushima disaster and information source credibility on Twitter”, in ISCRAM, 2012. https://www.emknowledge.org.au/ISCRAM2012/proceedings/112.pdf.

[10] S. A. Morshed, K. M. Ahmed, K. Amine & K. A. Moinuddin, “Trend analysis of large-scale Twitter data based on witnesses during a hazardous event: a case study on California wildfire evacuation”, World Journal of Engineering and Technology 9 (2021) 229. https://doi.org/10.4236/wjet.2021.92016.

[11] H. Li, D. Caragea & C. Caragea, “Combining self-training with deep learning for disaster tweet classification”, in The 18th International Conference on Information Systems for Crisis Response and Management (IS-CRAM 2021), 2021. https://par.nsf.gov/servlets/purl/10308599.

[12] J. Qadir, A. Ali, R. ur Rasool, A. Zwitter, A. Sathiaseelan & J. Crowcroft, “Crisis analytics: big data-driven crisis response”, Journal of International Humanitarian Action 1 (2016) 1. https://doi.org/10.1186/s41018-016-0013-9.

[13] C. I. Eke, A. A. Norman, L. Shuib & H. F. Nweke, “Sarcasm identification in textual data: systematic review, research challenges and open directions”, Artificial Intelligence Review 53 (2020) 4215. https://doi.org/10.1007/s10462-019-09791-8.

[14] A. Mohammed & R. Kora, “An effective ensemble deep learning framework for text classification”, Journal of King Saud University-Computer and Information Sciences 34 (2022) 8825. https://doi.org/10.1016/j.jksuci.2021.11.001.

[15] A. Onan, S. Koruko?lu & H. Bulut, “Ensemble of keyword extraction methods and classifiers in text classification”, Expert Systems with Applications 57 (2016) 232. https://doi.org/10.1016/j.eswa.2016.03.045.

[16] O. Sagi & L. Rokach, “Ensemble learning: a survey”, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8 (2018) e1249. http://dx.doi.org/10.1002/widm.1249.

[17] C. I. Eke, A. A. Norman, L. Shuib & Z. A. Long, “Random forest-based classifier for automatic sarcasm classification on twitter data using multiple features”, Journal of Information Systems and Digital Technologies 4 (2022) 205. file:///C:/Users/hp/Downloads/205.pdf.

[18] L. Rokach, Ensemble learning: pattern classification using ensemble methods, World Scientific, Singapore, 2019, pp. 1–300. http://dx.doi.org/10.1142/11325.

[19] R. ALRashdi & S. O’Keefe, “Deep learning and word embeddings for tweet classification for crisis response”, arXiv preprint arXiv:1903.11024, 2019. [Online]. https://arxiv.org/abs/1903.11024.

[20] C. I. Eke, A. Norman, L. Shuib, F. B. Fatokun & I. Omame, “The significance of global vectors representation in sarcasm analysis”, in 2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS), 2020, pp. 1–7. http://dx.doi.org/10.1109/ICMCECS47690.2020.246997.

[21] T. Sahni, C. Chandak, N. R. Chedeti & M. Singh, “Efficient Twitter sentiment classification using subjective distant supervision”, in 2017 9th International Conference on Communication Systems and Networks (COM-SNETS), 2017, pp. 548–553. https://arxiv.org/pdf/1701.03051.

[22] V. K. Neppalli, C. Caragea & D. Caragea, “Deep neural networks versus naive Bayes classifiers for identifying informative tweets during disasters”, in Proceedings of the 15th Annual Conference for Information Systems for Crisis Response and Management (ISCRAM), 2018. https://par.nsf.gov/servlets/purl/10204522.

[23] E. Schnebele, G. Cervone, S. Kumar & N. Waters, “Real time estimation of the Calgary floods using limited remote sensing data”, Water 6 (2014) 381. https://doi.org/10.3390/w6020381.

[24] N. Naderi, Computational analysis of arguments and persuasive strategies in political discourse, University of Toronto (Canada), 2020. https://utoronto.scholaris.ca/server/api/core/bitstreams/91f8c5fa-6fe8-4e7d-b9a6-444da6c95370/content.

[25] M. Basu, A. Shandilya, P. Khosla, K. Ghosh & S. Ghosh, “Extracting resource needs and availabilities from microblogs for aiding post-disaster relief operations”, IEEE Transactions on Computational Social Systems 6 (2019) 604. http://dx.doi.org/10.1109/TCSS.2019.2914179.

[26] S. Kumar, X. Hu & H. Liu, “A behavior analytics approach to identifying tweets from crisis regions”, in Proceedings of the 25th ACM Conference on Hypertext and Social Media, 2014, pp. 255–260. https://doi.org/10.1145/2631775.2631814.

[27] H. Purohit, C. Castillo, F. Diaz, A. Sheth & P. Meier, “Emergency-relief coordination on social media: Automatically matching resource requests and offers”, First Monday 19 (2014). http://dx.doi.org/10.5210/fm.v19i1.4848.

[28] S. Verma, G. Vieweg, W. J. Corvey, L. Palen, J. H. Martin, M. Palmer, A. Schram & K. M. Anderson, “Natural language processing to the rescue? Extracting ’situational awareness’ tweets during mass emergency”, in Proceedings of the International AAAI Conference on Web and Social Media, 2011, pp. 385–392. https://doi.org/10.1609/icwsm.v5i1.14119.

[29] M. Imran, S. Elbassuoni, C. Castillo, F. Diaz & P. Meier, “Extracting information nuggets from disaster-related messages in social media”, ISCRAM 201 (2013) 791. https://idl.iscram.org/files/imran/2013/613_Imran_etal2013.pdf.

[30] Y. Kryvasheyeu, H. Chen, N. Obradovich, E. Moro, P. Van Henten-ryck, J. Fowler & M. Cebrian, “Rapid assessment of disaster damage using social media activity”, Science Advances 2 (2016) e1500779. https://www.science.org/doi/pdf/10.1126/sciadv.1500779.

[31] P. Khare, G. Burel, D. Maynard & H. Alani, “Cross-lingual classification of crisis data”, in The Semantic Web–ISWC 2018: 17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018, Proceedings, Part I 17, 2018, pp. 617–633. https://doi.org/10.1007/978-3-030-00671-6 36.

[32] G. Burel, H. Saif & H. Alani, “Semantic wide and deep learning for detecting crisis-information categories on social media”, in The Semantic Web–ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, Proceedings, Part I 16, 2017, pp. 138–155. https://oro.open.ac.uk/51726/1/322.pdf.

[33] M. Y. Kabir & S. Madria, “A deep learning approach for tweet classification and rescue scheduling for effective disaster management”, in Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2019, pp. 269–278. http://dx.doi.org/10.1145/3347146.3359097.

[34] A. Bhoi, S. P. Pujari & R. C. Balabantaray, “A deep learning-based social media text analysis framework for disaster resource management”, Social Network Analysis and Mining 10 (2020) 1. https://link.springer.com/article/10.1007/s13278-020-00692-1.

[35] S. Kundu, P. Srijith & M. S. Desarkar, “Classification of short-texts generated during disasters: a deep neural network based approach”, in 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2018, pp. 790–793. http://dx.doi.org/10.1109/ASONAM.2018.8508695.

[36] A. Alshehri & S. Alahamri, “An ensemble learning for detecting situational awareness tweets during environmental hazards”, in 2019 IEEE International Systems Conference (SysCon), 2019, pp. 1–8. https://doi.org/10.1109/SYSCON.2019.8836814.

[37] S. Madichetty, “Identification of medical resource tweets using majority voting-based ensemble during disaster”, Social Network Analysis and Mining 10 (2020) 66. https://doi.org/10.1007/s13278-020-00679-y.

[38] A. Chiche & B. Yitagesu, “Part of speech tagging: a systematic review of deep learning and machine learning approaches”, Journal of Big Data 9 (2022) 10. https://doi.org/10.1186/s40537-022-00561-y.

[39] A. Priyadarshi & S. K. Saha, “Towards the first Maithili part of speech tagger: Resource creation and system development”, Computer Speech & Language 62 (2020) 101054. https://doi.org/10.1016/j.csl.2019.101054.

[40] N. N. A. Sjarif, N. F. M. Azmi, S. Chuprat, H. M. Sarkan, Y. Yahya & S. M. Sam, “SMS spam message detection using term frequency-inverse document frequency and random forest algorithm”, Procedia Computer Science 161 (2019) 509. https://doi.org/10.1016/j.procs.2019.11.150.

[41] C. I. Eke, A. A. Norman & L. Shuib, “Multi-feature fusion framework for sarcasm identification on Twitter data: A machine learning based approach”, PLoS One 16 (2021) e0252918. https://doi.org/10.1371/journal.pone.0252918.

[42] D. Jatnika, M. A. Bijaksana & A. A. Suryani, “Word2vec model analysis for semantic similarities in English words”, Procedia Computer Science 157 (2019) 160. https://doi.org/10.1016/j.procs.2019.08.153.

[43] Z. Quan, Z.-J. Wang, Y. Le, B. Yao, K. Li & J. Yin, “An efficient framework for sentence similarity modeling”, IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (2019) 853. https://cszjwang.github.io/sub_pages/pps/TALSP19.pdf.

[44] C. I. Eke, A. A. Norman & L. Shuib, “Context-based feature technique for sarcasm identification in benchmark datasets using deep learning and BERT model”, IEEE Access 9 (2021) 48501. http://dx.doi.org/10.1109/ACCESS.2021.3068323.

[45] A. Alhazmi, R. Mahmud, N. Idris, M. E. Mohamed Abo & C. I. Eke, “Code-mixing unveiled: enhancing the hate speech detection in Arabic dialect tweets using machine learning models”, PLoS One 19 (2024) e0305657. https://doi.org/10.1371/journal.pone.0305657.

[46] A. Yusuf, R. Dima & S. Aina, “Optimized breast cancer classification using feature selection and outliers detection”, Journal of the Nigerian Society of Physical Sciences 3 (2021) 298. https://doi.org/10.46481/jnsps.2021.331.

[47] P. U. Emmoh, C. I. Eke, T. Moses & A. Ovre, “Feature selection techniques for high-dimensional data analysis: applications, challenges, and future directions”, Nigerian Journal of Technological Development 22 (2025) 201. https://doi.org/10.63746/njtd.v22i1.2943.

[48] S. Tangirala, “Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm”, International Journal of Advanced Computer Science and Applications 11 (2020) 612. http://dx.doi.org/10.14569/IJACSA.2020.0110277.

[49] Z. Mohammadi-Pirouz, K. Hajian-Tilaki, M. Sadeghi Haddat-Zavareh, A. Amoozadeh & S. Bahrami, “Development of decision tree classification algorithms in predicting mortality of COVID-19 patients”, International Journal of Emergency Medicine 17 (2024) 126. https://doi.org/10.1186/s12245-024-00681-7.

[50] P. U. Emmoh & T. Moses, “A feature selection and scoring scheme for dimensionality reduction in a machine learning task”, Journal of the Nigerian Society of Physical Sciences 5 (2025) 2273. https://doi.org/10.46481/jnsps.2025.2273.

[51] X.-S. Yang, Introduction to algorithms for data mining and machine learning, Academic Press, Cambridge, MA, 2019, pp. 1–300. http://dx.doi.org/10.1016/C2018-0-02034-4.

[52] O. Okwuashi & C. E. Ndehedehe, “Deep support vector machine for hyperspectral image classification”, Pattern Recognition 103 (2020) 107298. https://doi.org/10.1016/j.patcog.2020.107298.

[53] Y. Al Amrani, M. Lazaar & K. E. El Kadiri, “Random forest and support vector machine based hybrid approach to sentiment analysis”, Procedia Computer Science 127 (2018) 511. https://doi.org/10.1016/j.procs.2018.01.150.

[54] L. Zhu, D. Qiu, D. Ergu, C. Ying & K. Liu, “A study on predicting loan default based on the random forest algorithm”, Procedia Computer Science 162 (2019) 503. https://doi.org/10.1016/j.procs.2019.12.017.

[55] L. Breiman, “Bagging predictors”, Machine learning 24 (1996) 123. https://doi.org/10.1007/BF00058655.

[56] D. O. Oyewola, E. G. Dada, J. N. Ndunagu, T. A. Umar & A. SA, “COVID-19 risk factors, economic factors, and epidemiological factors nexus on economic impact: machine learning and structural equation modelling approaches”, Journal of the Nigerian Society of Physical Sciences 3 (2021) 395. https://doi.org/10.46481/jnsps.2021.173.

[57] T. K. Ho, “The random subspace method for constructing decision forests”, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 832. https://www.ehu.eus/ccwintco/uploads/4/45/Presetacion-ibarandiaran-2012-01-27.pdf.

[58] M. Hossin & M. N. Sulaiman, “A review on evaluation metrics for data classification evaluations”, International Journal of Data Mining & Knowledge Management Process 5 (2015) 1. http://dx.doi.org/10.5121/ijdkp.2015.5201.

[59] D. K. Kwaghtyo & C. I. Eke, “Smart farming prediction models for precision agriculture: a comprehensive survey”, Artificial Intelligence Review 56 (2023) 5729. https://doi.org/10.1007/s10462-022-10266-6.

[60] M. Mourad, M. El-Seoud, A. El-Sayed, H. El-Bassiouny & H. El-Bahnasawy, “Machine learning and feature selection applied to SEER data to reliably assess thyroid cancer prognosis”, Scientific Reports 10 (2020) 5176. https://doi.org/10.1038/s41598-020-62023-w.

[61] D. T. Nguyen, S. Joty, M. Imran, H. Sajjad & P. Mitra, “Applications of online deep learning for crisis response using social media information”, arXiv preprint arXiv:1610.01030 (2016). https://doi.org/10.48550/arXiv.1610.01030.

[62] S. E. Vieweg, Situational awareness in mass emergency: a behavioral and linguistic analysis of microblogged communications, University of Colorado at Boulder, 2012. https://www.proquest.com/openview/540ee2ba902309c5ad7314438e06ea42/1?cbl=18750&pq-origsite=gscholar.

Published

2025-08-01

How to Cite

Effective tweets classification for disaster crisis based on ensemble of classifiers. (2025). Journal of the Nigerian Society of Physical Sciences, 7(3), 2675. https://doi.org/10.46481/jnsps.2025.2675

Issue

Section

Computer Science

How to Cite

Effective tweets classification for disaster crisis based on ensemble of classifiers. (2025). Journal of the Nigerian Society of Physical Sciences, 7(3), 2675. https://doi.org/10.46481/jnsps.2025.2675