Please use this identifier to cite or link to this item:
http://repository.aaup.edu/jspui/handle/123456789/2240
Title: | On Developing a Machine Learning Context-sensitive Sentence Level Sentiment Analyzer رسالة ماجستير |
Authors: | Rabaia, Shatha Mahmoud$AAUP$Palestinian |
Keywords: | Data Acquisition and Cleaning,Tokenization and Feature Extraction,Experimental Setup and Evaluation, |
Issue Date: | 2022 |
Publisher: | AAUP |
Abstract: | Sentiment Analysis (SA) has become one of the most reliable tools for assisting organizations better understand the perception of their users about the products and services that they offer. In particular, with the ever-increasing user-generated sentiment reviews on the Web, SA tools have become significantly indispensable. The exploitation of such tools in real-world application domains does not only serve organizations, but also individuals who are interested in learning more about the various perceptions about the products and/or services that are being used by other users or customers. Over the past few years, there has been a growing number of SA techniques which can be characterized by a number of strengths and weaknesses; demonstrated by their accuracy rates in terms of the results that they produce. Lexicon-based and machine learning approaches have been among the most frequent techniques that are employed for this purpose in particular. To address limitations of these techniques, newer neural network models have been proposed in an attempt to automate the feature learning process and enrich the learned features with word contextual embeddings to identify their semantic orientations. However, the high accuracy rates of such models entail sentences should fall under the same domain of the training data used beforehand. In this thesis, we experimentally evaluate six sentiment classification methods, namely using Support Vector Machines (SVM), Naive Bayes (NB), Logistic Regression, V Random Forests, Feedforward Neural Network, and Convolutional Neural Networks. In addition to experimentally evaluating the first four conventional methods, we also measure the impact of incorporating lexical semantic knowledge captured by WordNet on expanding original words in sentences using a variety of NLP pipelines. The two latter methods are based on neural networks with automated feature processing and ability to enrich learned features with word contextual embeddings. Besides measuring their quality, we experimentally investigate the impact of exploiting GloVe word embeddings on enriching feature vectors extracted from sentiment sentences. In the conducted experiments, we have used four real-world datasets that comprise1,600,000 tweets, 50,000 movie reviews, 10,662 sentences, and 300 generic movie reviews. With regard to the first five methods, results indicate that coupling lemmatization and knowledge-based n-gram features proved to produce higher accuracy rates. With this coupling, the accuracy of the SVM classifier has improved to 90.43%, while it was 86.83%, 90.11%, 86.20%, 88.01%, respectively using the four other classifiers. For neural networks-based methods (FNN and CNN), findings indicate that using larger dimensions of GloVe word embeddings increases the sentiment classification accuracy. In particular, results demonstrate that the achieved accuracy of the CNN using a larger feature map, a smaller filter size, as well as ReLU activation function in the convolutional layer was 90.57% when applied on the IMDB dataset, while it was 82.02% and 78.14% using Twitter’s and the sentiment sentences datasets, respectively. |
Description: | Master`s degree in Computer Science |
URI: | http://repository.aaup.edu/jspui/handle/123456789/2240 |
Appears in Collections: | Master Theses and Ph.D. Dissertations |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
شذا ربايعة.pdf | 3.39 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Admin Tools