Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques

Maree, Mohammed$AAUP$Palestinian; Eleyat, Mujahed$AAUP$Palestinian; Mesqali, Enas$AAUP$Palestinian

Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/1868

Full metadata record

DC Field	Value	Language
dc.contributor.author	Maree, Mohammed$AAUP$Palestinian	-
dc.contributor.author	Eleyat, Mujahed$AAUP$Palestinian	-
dc.contributor.author	Mesqali, Enas$AAUP$Palestinian	-
dc.date.accessioned	2024-07-22T07:28:06Z	-
dc.date.available	2024-07-22T07:28:06Z	-
dc.date.issued	2024-03	-
dc.identifier.citation	Mohammed Maree,Mujahed Eleyat,Enas Mesqali, "Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques", The International Arab Journal of Information Technology (IAJIT) ,Volume 21, Number 02, pp. 257 - 270, March 2024, doi: 10.34028/21/2/8.	en_US
dc.identifier.issn	1683-3198	-
dc.identifier.uri	https://iajit.org/paper/4989/Optimizing-Machine-Learning-based-Sentiment-Analysis-Accuracy-in-Bilingual-Sentences-via-Preprocessing-Techniques	-
dc.identifier.uri	http://repository.aaup.edu/jspui/handle/123456789/1868	-
dc.description.abstract	With the recent advances in Natural Language Processing (NLP) technologies, the ability to process, analyze, and understand sentiments expressed in user-generated reviews regarding the products and services they use is becoming more achievable. Despite the latest improvements in this field, little attention has been given to multilingual sentiment analysis. In this article, a framework is presented for sentiment analysis in Arabic and English using two datasets (ASTD, AJGT) along with their translations. Preprocessing techniques, including n-gram tokenization, Arabic-specific stop words removal, punctuation removal, removing repeating characters, parts of speech tagging, stemming, and lemmatization, are applied. Four machine learning classifiers, namely Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), and Support Vector Machine (SVM), are employed. We highlight existing specialized research in sentiment analysis for Arabic and English, as well as the employed techniques in each. Furthermore, the impact of preprocessing on accuracy results for both Arabic and English languages is investigated through separate experiments for each step. Experimental results on the ASTD dataset demonstrate close performance across classifiers, with the SVM classifier achieving the highest accuracy of 70%. However, the accuracy varied when using the AJGT dataset, with the NB classifier yielding the best accuracy at approximately 87%. The experiments on the translated datasets from Arabic to English did not exhibit significant differences, although some features performed slightly better using the Arabic datasets.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Zarqa University	en_US
dc.relation.ispartofseries	The International Arab Journal of Information Technology (IAJIT);Volume 21, Number 02	-
dc.subject	Machine learning	en_US
dc.subject	Bilingual sentiment analysis	en_US
dc.subject	NLP	en_US
dc.subject	Sentiment datasets	en_US
dc.title	Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques	en_US
dc.type	Article	en_US
Appears in Collections:	Faculty & Staff Scientific Research publications

Files in This Item:

File	Description	Size	Format
Optimizing-Machine-Learning-based-Sentiment-Analysis-Accuracy-in-Bilingual-Sentences-via-Preprocessing-Techniques.pdf		1.6 MB	Adobe PDF	View/Open

Show simple item record

Admin Tools

ARAB AMERICAN UNIVERSITY Repository