Performance Study of N-grams in the Analysis of Sentiments

O. E. Ojo; A. Gelbukh; H. Calvo; O. O.  Adebanji

doi:10.46481/jnsps.2021.201

Authors

O. E. Ojo
[email protected]

Instituto Politécnico Nacional, Natural Language and Text Processing Laboratory, Centro de Investigacion en Computación, CDMX, Mexico
A. Gelbukh
Instituto Politécnico Nacional, Natural Language and Text Processing Laboratory, Centro de Investigacion en Computación, CDMX, Mexico
H. Calvo
Instituto Politécnico Nacional, Natural Language and Text Processing Laboratory, Centro de Investigacion en Computación, CDMX, Mexico
O. O. Adebanji
Instituto Politécnico Nacional, Natural Language and Text Processing Laboratory, Centro de Investigacion en Computación, CDMX, Mexico

Keywords:

ngrams, economic texts, machine learning, deep learning, sentiment analysis

Abstract

In this work, a study investigation was carried out using n-grams to classify sentiments with different machine learning and deep learning methods. We used this approach, which combines existing techniques, with the problem of predicting sequence tags to understand the advantages and problems confronted with using unigrams, bigrams and trigrams to analyse economic texts. Our study aims to fill the gap by evaluating the performance of these n-grams features on different texts in the economic domain using nine sentiment analysis techniques and found more insights. We show that by comparing the performance of these features on different datasets and using multiple learning techniques, we extracted useful intelligence. The evaluation involves assessing the precision, recall, f1-score and accuracy of the function output of the several machine learning algorithms proposed. The methods were tested using Amazon, IMDB, Reuters, and Yelp economic review datasets and our comprehensive experiment shows the effectiveness of n-grams in the analysis of sentiments.

Dimensions

REFERENCES

H. Gómez-Adorno, I. Markov, G. Sidorov, J. Posadas-Durán, M. A. Sanchez-Perez, & L. Chanona-Hernandez, “Improving feature representation based on a neural network for author profiling in social media texts”, Computational Intelligence and Neuroscience 2016 (2016) 1638936.

O. Kolesnikova & A. Gelbukh, “A study of lexical function detection with word2vec and supervised machine learning”, Journal of Intelligent & Fuzzy Systems 39 (2020) 1993.

S. Poria, E. Cambria, & A. Gelbukh, “Aspect extraction for opinion mining with a deep convolutional neural network”, Knowledge-Based System 108 (2016) 42.

O. E. Ojo, A. Gelbukh, H. Calvo, O. O. Adebanji, & G. Sidorov, “Sentiment detection in economics texts”, Advances in Computational Intelligence @ MICAI 2020 12469 (2020) 271.

T. Lugo-Garcia,A. Gelbukh, & G. Sidorov, “Unsupervised learning of word combinations for syntactic disambiguation”, Avances en la Ciencia de la Computación. Proceedings of the Workshop on Human Language Technologies at the 5th Mexican International Conference on Computer Science, ENC-2004 (2004) 311.

A. Gelbukh & O. Kolesnikova, “Supervised machine learning for predicting the meaning of verb-noun combinations in Spanish” MICAI 2010. Lecture Notes in Artificial Intelligence 6438 (2010) 196

O. Juárez Gambino & H. Calvo, “Predicting emotional reactions to news articles in social networks”, Computer Speech & Language 58 (2019) 280.

H. Gómez-Adorno, R. Fuentes-Alba, I. Markov, G. Sidorov & A. Gelbukh, “A convolutional neural network approach for gender and language variety identification”, Journal of Intelligent & Fuzzy Systems 36 (2019) 4845.

P. Pakray, A. Pal, G. Majumder, & A. Gelbukh, “Resource building and parts-of-speech (pos) tagging for the mizo language”, 14th Mexican International Conference on Artificial Intelligence, MICAI 2015 (2015) 3.

S. N. Galicia-Haro, A. Gelbukh, & I. A. Bolshakov, “Identification of composite named entities in a spanish textual database”, 9th International Conference on Applications of Natural Languages to Information Systems, Salford, UK 3136 (2004) 395.

M. Pekka, A. Sinha, P. Korhonen, J. Wallenius, & P. Takala, “Good debt or bad debt: Detecting semantic orientations in economic texts”, Journal of the Association for Information Science and Technology 65 (2014) 782.

G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh & L. Chanona Hernández, “Syntactic n-grams as machine learning features for natural language processing”, Expert Systems with Applications 41 (2014) 853.

S. Poria, N. Majumder, D. Hazarika, E. Cambria, A. Gelbukh & A. Hussain, “Multimodal sentiment analysis: Addressing key issues and setting up the baselines”, IEEE Intelligent Systems 33 (2018) 17.

S. T. Aroyehun & A. Gelbukh, “Aggression detection in social media: Using deep neural networks, data augmentation, and pseudo labeling”, Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-1) Santa Fe, USA (2018) 90.

D. Kotzias, M. Denil, N. de Freitas & P. Smyth, “From group to individual labels using deep features”, KDD 2015 (2015) 597.

V. Athanasiou & M. Maragoudakis, “A novel, gradient boosting framework for sentiment analysis in languages where NLP resources are not plentiful: A case study for modern Greek”, Algorithms 10 (2017) 34.