On the Combination of NLP and Extrinsic Semantic Resources for  Developing an Arabic-English Sentiment Analyzer رسالة ماجستير

Mesqali, enas$AAUP$Palestinian

Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/3127

Full metadata record

DC Field	Value	Language
dc.contributor.author	Mesqali, enas$AAUP$Palestinian	-
dc.date.accessioned	2025-02-05T07:44:25Z	-
dc.date.available	2025-02-05T07:44:25Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://repository.aaup.edu/jspui/handle/123456789/3127	-
dc.description	Master`s degree in Computer Science	en_US
dc.description.abstract	The recent advancements in Natural Language Processing (NLP) technologies have significantly enhanced the capabilities of processing, analyzing, and understanding sentiments expressed in user-generated reviews across various products and services. This surge of interest in sentiment analysis has spurred considerable research efforts. In this study, we explore sentiment analysis with a specific focus on Arabic language. Leveraging both traditional pre processing techniques and machine learning algorithms, we propose a comprehensive sentiment analysis model consisting of four stages. The primary objective of our model is to harness English language resources and techniques to gauge their impact on classifier accuracy when applied to Arabic sentences. Through a series of experiments conducted on Arabic datasets and their English translations, we assess the effectiveness of various pre-processing methods and machine learning classifiers: Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Support Vector Machines (SVM). Notably, SVM classifier consistently outperformed others, exhibiting the highest accuracy across most scenarios especially when combining Lemmatization and Stemming. Furthermore, we explore the influence of translating datasets and incorporating synonyms on sentiment analysis accuracy. While the translation of datasets from Arabic to English and vice versa did not yield significant changes in accuracy, the inclusion of synonyms from English datasets in Arabic sentiment analysis experiments produced mixed results. This underscores the intricacies of language-specific nuances and the challenges in effectively capturing sentiment across different languages. V When comparing our study with previous research that used the ASTD dataset, several key differences and similarities emerge. Previous studies explored a range of classifiers, including SVM, NB, LR, CNN, and RNTN, with accuracy results varying between 85% and 90% for traditional features like n-grams, TF-IDF, and word embeddings like Word2Vec. However, the RNTN algorithm showed a lower accuracy rate of 58.5%, and the SVM algorithm achieved 51.7%. Other research focused on deep learning models like CNN and LSTM, which yielded accuracy rates of 64.3% and 64.75%, respectively. In contrast, our study highlighted the importance of specific pre-processing techniques, demonstrating that methods such as lemmatization and stemming could significantly enhance the performance of machine learning classifiers like SVM, achieving accuracy results of up to 80%. Overall, our study showcases the evolving landscape of sentiment analysis research, highlighting the adaptability of techniques to address language-specific challenges and nuances. These findings contribute to the broader understanding of sentiment analysis methodologies and underscore the importance of considering linguistic differences in sentiment analysis tasks. Finally, recommendations for future research include expanding the Arabic dataset and exploring advanced deep learning models to capture more complex patterns. Additionally, refining linguistic tools specific to Arabic could further enhance sentiment analysis accuracy. These steps aim to better address the intricacies of language-specific challenges and contribute to more effective sentiment analysis methodologies	en_US
dc.publisher	AAUP	en_US
dc.subject	Data Collection,Computer Science,Classification techniques	en_US
dc.title	On the Combination of NLP and Extrinsic Semantic Resources for Developing an Arabic-English Sentiment Analyzer رسالة ماجستير	en_US
dc.title.alternative	نحو دمج تقنيات معالجة اللغة الطبيعية ومصادر دلالات المعاني الخارجية لبناء نظام تحليل الآراء.	en_US
dc.type	Thesis	en_US
Appears in Collections:	Master Theses and Ph.D. Dissertations

Files in This Item:

File	Description	Size	Format
ايناس مسقلة.pdf		2.34 MB	Adobe PDF	View/Open

Show simple item record

Admin Tools

ARAB AMERICAN UNIVERSITY Repository