A Hybrid CNN-LSTM Framework for Enhanced Arabic  Sentiment Analysis: Investigating Emoji Encoding and  Preprocessing Strategies رسالة ماجستير

Alawneh, Hussam Fawzi Abed$AAUP$Palestinian

Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/3780

Full metadata record

DC Field	Value	Language
dc.contributor.author	Alawneh, Hussam Fawzi Abed$AAUP$Palestinian	-
dc.date.accessioned	2026-02-23T11:39:19Z	-
dc.date.available	2026-02-23T11:39:19Z	-
dc.date.issued	2025	-
dc.identifier.uri	http://repository.aaup.edu/jspui/handle/123456789/3780	-
dc.description	Master \ Data Science and Business Analytics	en_US
dc.description.abstract	Social media users often express emotions, ideas, and thoughts through text in posts and tweets, which can be used to determine the text’s polarity as positive or negative - a process known as sentiment analysis. Sentiment analysis has become critical for various real-world domains, including politics, tourism, e-commerce, education, and health. However, although sentiment analysis approaches perform well with English text, they face notable drawbacks when dealing with Arabic text. The morphological complexity inherent in the Arabic language poses challenges for building robust models, making it difficult to understand public sentiment and subsequently make informed decisions. In response to these challenges, effective data preprocessing and deep learning techniques are employed to overcome the complexity of the Arabic language and provide insightful sentiment predictions. This thesis evaluates a combined Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) framework with different data preprocessing techniques for Arabic Sentiment Analysis (ASA) using the Arabic Sentiment Twitter Corpus (ASTC) dataset. Three experiments with eight distinct preprocessing configurations were conducted to evaluate the effect of data preprocessing on Arabic sentiment analysis, namely the effect of encoding and translating emojis to their real and emotional meanings. Emoji meanings were collected from four websites specialized in defining the meaning of emojis in social media, which resulted in a new dataset of emoji meaning called the “Emoji Meaning” dataset. Furthermore, the CNN-LSTM parameters were optimized using the Keras Tuner during the 5- fold cross-validation process. The proposed model with emoji translation into Arabic text, obtained the highest accuracy rate (91.85%) by keeping non-Arabic words, removing punctuations, using the Snowball stemmer, and using Keras embedding. This approach yields competitive results compared to other state-of-the-art approaches, proves that emoji encoding enriches text by accurately reflecting emotions, and investigates the effect of data preprocessing on model performance. This allows the hybrid model to achieve results comparable to other studies that use the same ASTC dataset, thereby improving sentiment analysis accuracy	en_US
dc.publisher	AAUP	en_US
dc.subject	Social media,models,Memory (CNN-LSTM),model performance,Hybrid models,ASTC Data,Data Science,Business	en_US
dc.title	A Hybrid CNN-LSTM Framework for Enhanced Arabic Sentiment Analysis: Investigating Emoji Encoding and Preprocessing Strategies رسالة ماجستير	en_US
dc.title.alternative	طار هجين قائم على الشبكات العصبية الالتفافية و شبكات الذاكرة طويلة المدى قصيرة الاجل لتعزيز تحليل المشاعر في اللغة العربية: دراسة استراتيجيات ترميز الرموز التعبيرية و المعالجة المسبقة.	en_US
dc.type	Thesis	en_US
Appears in Collections:	Master Theses and Ph.D. Dissertations

Files in This Item:

File	Description	Size	Format
حازم علاونة.pdf		2.12 MB	Adobe PDF	View/Open

Show simple item record

Admin Tools

ARAB AMERICAN UNIVERSITY Repository