Performance Study of N-grams in the Analysis of Sentiments

In this work, a study investigation was carried out using n-grams to classify sentiments with different machine learning and deep learning methods. We used this approach, which combines existing techniques, with the problem of predicting sequence tags to understand the advantages and problems confronted with using unigrams, bigrams and trigrams to analyse economic texts. Our study aims to fill the gap by evaluating the performance of these n-grams features on different texts in the economic domain using nine sentiment analysis techniques and found more insights. We show that by comparing the performance of these features on different datasets and using multiple learning techniques, we extracted useful intelligence. The evaluation involves assessing the precision, recall, f1-score and accuracy of the function output of the several machine learning algorithms proposed. The methods were tested using Amazon, IMDB, Reuters, and Yelp economic review datasets and our comprehensive experiment shows the effectiveness of n-grams in the analysis of sentiments. DOI:10.46481/jnsps.2021.201


Introduction
Machine learning and deep learning architectures are the bane of many Natural Language Processing (NLP) research works. To address a variety of tasks, including sentiment analysis, several machine learning and deep learning architectures have been proposed. Uncommon and unfamiliar words used in information and knowledge exchange can influence different aspects of life including marketing, education, governance, etc. As an integral part of the internet, the digital media platforms facilitates meaningful information and knowledge exchange with a list of other network users. Data collection and reviews, with diverse views and opinions about events, is gaining more impact and fast becoming an attraction for researchers and generating significant computational challenges. Effective wide-ranging mining of information from text helps to discover useful knowledge of vital significance. Computers can detect, interpret and produce the sentiments (or tags) of a text, thereby improving government and private companies' operations, recognizing possible threats, minimizing crime, and improving public services.
The main objective of this research is to observe the efficacy of unigrams, bigrams and trigrams as characteristics of a word sequence and to predict a tag for sentiment classification. The target is to extract words or phrases behind the tags and to use the machine learning and deep learning methods to classify the data whilst measuring the accuracy of the classification. Using n-grams to study the opinion of people, we will be able to see their strength by tagging them. As they contribute to conceptual characterization, we use machine learning and deep learning models on the text.
In machine learning, two major techniques are adopted: supervised learning [1,2,3,4] and unsupervised methods [5]. Supervised methods have a training dataset with manually defined tags, and they learn the characteristics that match the tags from the training data. Gelbukh and Kolesnikova [2,6] developed methods that allowed the automatic sorting of word combinations into pre-established categories relating to automated collocation classification. Gambino and Calvo [7] considered the use of text-learned opinions using NLP techniques to distinguish tags. On the other hand, unsupervised systems are more flexible across different kinds of texts and domains. Different machine learning algorithms have been used in past works [6,2,4], and also neural network models [1] to gain the knowledge of how to predict the sentiments of text [3,8]. Pre-trained models are very helpful in classifying text and other NLP activities.
To extract the keywords behind the text's feelings, the models will be pre-trained on the training data and the accuracy of prediction of the models will be measured, registered and compared. The remaining part of the paper is organized as follows: Section 2 deals with the background and relevant works, Section 3 describes the approach used in this study, while Section 4 shows the features we experimented. Sections 5 and 6 provides the information about the machine learning and deep learning algorithms used and our experimental findings with the discussion of results in section 7. Section 8 gives conclusion about the work.

Background and Related Work
Different works have been carried out in the field of sentiment analysis [1,7,3,4]. The social media and other digital media platforms, as an accepted means of communication, has flourished thereby aiding intelligence gathering and information dissemination. Machine learning techniques have shown good results in analysing sentiments in text [6,2,4] and other tasks such as part of speech recognition (PoS) [9], named entity recognition (NER) [10], etc. Linear statistical models, such as random-field (CRF) and Hidden Markov (HMM) fields, are NLP approaches used for sequence tagging with a long history of excellent performance. However, adapting these models to new tasks in new domains or languages is challenging.
The combination of categorical grammar, annotation, acquisition of lexicons and semantic networks was used by Pekka et al. [11] to analyze the feelings of the text and to define the tags of the text. They investigated how the overall phrase structured data and domain-specific language usage could aid in the detection of semantic orientations in financial and economic news.
In [12], the use of syntactic n-grams (Sn-grams) to incorporate syntactic knowledge into machine learning algorithms proved successful. Sn-grams were utilized as a baseline for authorship identification, replacing standard n-grams of words, POS tags, and characters.  [13] have used text-CNN for extraction of text features with LSTM architecture in addition to the unimodal input functions. In a multimodal sentiment analysis task, they explored and analyzed the performance of three deep-learning-based architectures and recorded their results.
On a social media data baseline, [14] explored the efficiency of deep neural network models of different complexity based on character n-grams. The training was done with augmented data and pseudo-labeled samples, and the accuracy result was enhanced.
[15] also used classifiers to predict sentence tags using an objective function to infer similarity between sentences. A new objective function was used to train many classifiers to make predictions at the instance level, promoting smoothness of inferred instance-level labels while keeping group-level label constraints in place.

Approach
To recognize patterns and regularities in data, the machine learning and deep learning algorithms use learned patterns to predict new observations. We pre-processed the text data before we applied the different learning algorithms on the text  data. We tokenized the data into words and n-grams and generated a vocabulary of all the special n-grams that occurred in the document. Using the term frequency-inverse document frequency (tf-idf) technique, data features were rescaled. We used supervised learning and we compared the results. For our work, we chose to filter out uncommon, non-informative For sentiment analysis, these models have been extensively tested, and provided accurate results when working with various dataset types. The words were patterned for parsing, such that every n-gram consists of n terms and are tagged accordingly. The accuracy of these methods often differ widely in validation, ranging from using small samples to a wide array of tagged data.

Experiments
The data used for this analysis consist of a collection of four related economic and financial market reviews selected from multiple texts that have been tagged with positive, negative and neutral classes. These four datasets, extracted from different digital media platforms, have been selected because they contain explicit economic sentiments from which the machine and deep learning algorithms can learn. We have used the Reuters dataset in Pekka et al. [11], containing subjective sentences from economic review, and the IMDb, Amazon, and Yelp datasets in Kotzias et al. [15], which contains text sentences from reviews of products, movies, and restaurants. The first dataset contains reviews and tags for products sold on amazon.com while the second dataset contains the sentiment dataset for IMDb movie reviews. The third and fourth datasets have a collection of texts about economic and restaurant reviews respectively. The text were splitted into training and testing data. Using the training set, the machine and deep learning algorithms were trained to understand, extract and evaluate subjective information from the data with n-grams as features.
Basically, after fitting the training data to the models, we used the various models to predict the tags of the test data. Using the training set to train the algorithm, we translated the data into numeric form, while the test set was used to evaluate the performance of the machine and deep learning models. The machine and deep learning algorithms learnt from the training data, passing the features and tags as parameters. The models predicted the outcomes, while the precision, accuracy and f1 score were obtained using the n-gram features within the model. To keep a list of the word vectors, we transformed the text array into a TF-IDF function matrix and a vocabulary was created. A Table 3. Precision, Recall, f1 score and accuracy of the classifiers trained on the third dataset. machine and/or deep learning algorithm can then directly be used on the encoded vectors. The classification and evaluation of the different meanings of the text was carried out and we compared them to each other. The n-grams offered an indication of the words that could affect the tags of the text. We extracted the n-gram distribution such as unigrams, bigrams, and trigrams for use in the different models, thereby making the learning algorithms more intelligent for proper prediction. We applied the machine and deep learning algorithm on the text for classification and the accuracy score for all models used in the experiment were calculated.

Results and Discussion
In this study, we present a performance review of special n-gram based evaluation of a sequence labeling task using different learning algorithm. We introduced machine learning and deep learning techniques to analyze the sentiments in the data for better and faster decision making, and we were able to compare the output of the techniques implemented, thus adding to the state-of-the-art literature on tasks of sentiment analysis. These algorithms were applied on the datasets to predict the tags and to classify it accordingly using the n-grams features.
The performance of the n-grams in the different machine and deep learning approach was calculated using the overall accuracy measurement. For a comparative performance evaluation of each system in terms of predicting the tags correctly, we present the results for the nine methods used for precision, recall, accuracy and F1-score calculation. Tables 1-4  The macro-averaged f1, recall, precision and accuracy scores of the various models used are shown in Tables 1-4. The findings indicate that the SVM and the MLP models generally improved the effectiveness of the classification. The results also reveals that the DTC, GBC, KNN and the XGB failed to perform well in the classification task. In the comparative analysis, using the different methods of machine learning and n-gram approaches on the datasets, results were better compared and the effectiveness of the n-gram features were recorded.
The n-gram features gave a very good performance for all learning algorithms with the unigrams performing better than the bigrams and trigrams in the classification task. The SVM, LRM, RFC, NBA and the MLP models are the most reliable for all of the n-gram features. In the first dataset (see Figure 1), RFC, MLP, SVM had maximum scores among the models. The SVM, LRM, and MLP models gave the highest output for all n-gram functions on the second dataset (see Figure 2). On the third dataset, NBA, SVM and MLP are with the highest results (see Figure 4) while on the fourth dataset, SVM, LRM and MLP performed best on the test dataset (see Figure 4). The models used with the feature classification techniques shows the effectiveness of n-grams for sentiments tagging and the most reliable methods of classification.

Conclusion
An important problem in the analysis of sentiments is being able to determine the contextual labels or tags of words and phrases. We addressed this problem in this study by successfully introducing various machine learning and deep learning approaches to produce the labels or tags of economics and financial reviews text using n-grams as features. Modeling was performed using different pre-processing techniques in texts, converting the text into vectors, and applying various machine learning and deep learning techniques on the different datasets. The use of multiple classifiers in this analysis led to a better evaluation efficiency than any individual classifier. The findings recorded in this study suggests that the support vector machine and multi-layer perceptron neural networks were the best options for achieving successful results, because they efficiently and effectively classify the sentiment tags behind the sentence in the text. The unigram model, which is an n-gram analysis representation at low level, has a greater predictive potential compared to the bigram and trigram models. While high-level n-gram representations account for the complexities of the human language, their use in predicting consumers' choices is less efficient than low-level n-gram representations in these economic reviews.