Please use this identifier to cite or link to this item:
http://repository.aaup.edu/jspui/handle/123456789/3780| Title: | A Hybrid CNN-LSTM Framework for Enhanced Arabic Sentiment Analysis: Investigating Emoji Encoding and Preprocessing Strategies رسالة ماجستير |
| Other Titles: | طار هجين قائم على الشبكات العصبية الالتفافية و شبكات الذاكرة طويلة المدى قصيرة الاجل لتعزيز تحليل المشاعر في اللغة العربية: دراسة استراتيجيات ترميز الرموز التعبيرية و المعالجة المسبقة. |
| Authors: | Alawneh, Hussam Fawzi Abed$AAUP$Palestinian |
| Keywords: | Social media,models,Memory (CNN-LSTM),model performance,Hybrid models,ASTC Data,Data Science,Business |
| Issue Date: | 2025 |
| Publisher: | AAUP |
| Abstract: | Social media users often express emotions, ideas, and thoughts through text in posts and tweets, which can be used to determine the text’s polarity as positive or negative - a process known as sentiment analysis. Sentiment analysis has become critical for various real-world domains, including politics, tourism, e-commerce, education, and health. However, although sentiment analysis approaches perform well with English text, they face notable drawbacks when dealing with Arabic text. The morphological complexity inherent in the Arabic language poses challenges for building robust models, making it difficult to understand public sentiment and subsequently make informed decisions. In response to these challenges, effective data preprocessing and deep learning techniques are employed to overcome the complexity of the Arabic language and provide insightful sentiment predictions. This thesis evaluates a combined Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) framework with different data preprocessing techniques for Arabic Sentiment Analysis (ASA) using the Arabic Sentiment Twitter Corpus (ASTC) dataset. Three experiments with eight distinct preprocessing configurations were conducted to evaluate the effect of data preprocessing on Arabic sentiment analysis, namely the effect of encoding and translating emojis to their real and emotional meanings. Emoji meanings were collected from four websites specialized in defining the meaning of emojis in social media, which resulted in a new dataset of emoji meaning called the “Emoji Meaning” dataset. Furthermore, the CNN-LSTM parameters were optimized using the Keras Tuner during the 5- fold cross-validation process. The proposed model with emoji translation into Arabic text, obtained the highest accuracy rate (91.85%) by keeping non-Arabic words, removing punctuations, using the Snowball stemmer, and using Keras embedding. This approach yields competitive results compared to other state-of-the-art approaches, proves that emoji encoding enriches text by accurately reflecting emotions, and investigates the effect of data preprocessing on model performance. This allows the hybrid model to achieve results comparable to other studies that use the same ASTC dataset, thereby improving sentiment analysis accuracy |
| Description: | Master \ Data Science and Business Analytics |
| URI: | http://repository.aaup.edu/jspui/handle/123456789/3780 |
| Appears in Collections: | Master Theses and Ph.D. Dissertations |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| حازم علاونة.pdf | 2.12 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Admin Tools