Optimized Machine Learning Based Feature Selection Methods for Sentiment Classification رسالة ماجستير

Basheer, Mohammad$AAUP$Palestinian

Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/2900

Full metadata record

DC Field	Value	Language
dc.contributor.author	Basheer, Mohammad$AAUP$Palestinian	-
dc.date.accessioned	2024-10-28T10:11:47Z	-
dc.date.available	2024-10-28T10:11:47Z	-
dc.date.issued	2017	-
dc.identifier.uri	http://repository.aaup.edu/jspui/handle/123456789/2900	-
dc.description	Master's degree in Computer Science	en_US
dc.description.abstract	Analyzing the sentiment and opinions has become crucial, especially that institutions, governments, and private sector companies became very interested in knowing what people think about certain events or products. The data size on the web is enormous and growing rapidly. Processing and analyzing this size of data is hard and costly; therefore, existing solutions of sentiment analysis are suffering from deficiencies, such as high dimensionality and low accuracy. The process of selecting relevant features is a matter of research. The selection of relevant features that can produce high accuracy in classification is not an easy task. Therefore, the goal of this thesis is to classify text based opinions into positive and negative sentiments effectively by selecting the relevant feature subset. To solve this problem, we present an approach that utilizes machine learning and optimization evolutionary algorithms in selecting an effective feature subset in four methods. Firstly, choosing the feature subset is done based on machine learning algorithm. The support vector machine algorithm (SVM) is used to produce a weight vector after the learning process which contains values that represent the term importance for the classification process. Secondly, evolutionary algorithm (G.A) is used to optimize the feature subset generated from the first method in order to enhance the sentiment classification. V The third method hybridizes the machine learning based feature subset that is generated by the first method with a statistical based feature subset produced using correlation feature selection method. The fourth method is called optimized hybrid method, in which the optimization evolutionary (G.A) is applied on top of the feature subset that resulted from the third method. Two well-known sentiment analysis datasets that are publicly available were used to test and validate the proposed approaches. The first is polarity dataset v2.0 (D1), the second is polarity dataset v1.0 (D2), and a third dataset (D3) which is a combination of D1 and D2. Sentiment classification performance in this research is evaluated using accuracy, recall, precision, and f-measure. The results achieved in this research outperform the results reported in existing studies. In our approach in the first method using machine learning for feature weighting we achieved high accuracy reaching 98.79, and we were able to improve those results in the second method to reach higher results of 99.21 using optimization evolutionary. And when we merged the features obtained using machine learning weighting using weight by SVM with those obtained using statistical method of weight by correlation, the classification accuracy reached 99.46 which is even better than both previous methods. After improving the feature subset using the hybrid method we were able to improve the accuracy to reach 99.71 and decrease the feature subset size. Processing produces a large feature set, the results that we achieved were based on feature subsets extracted from this large set. These subsets contain the most relevant VI features for classification, and when compared to existing works the subset sizes we used were smaller which reduces the computation time required for classification.	en_US
dc.publisher	AAUP	en_US
dc.subject	data preprocessing,optimized nybrid methods,datasets	en_US
dc.title	Optimized Machine Learning Based Feature Selection Methods for Sentiment Classification رسالة ماجستير	en_US
dc.type	Thesis	en_US
Appears in Collections:	Master Theses and Ph.D. Dissertations

Files in This Item:

File	Description	Size	Format
محمد بشير.pdf		1.88 MB	Adobe PDF	View/Open

Show simple item record

Admin Tools

ARAB AMERICAN UNIVERSITY Repository