Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/2900
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBasheer, Mohammad$AAUP$Palestinian-
dc.date.accessioned2024-10-28T10:11:47Z-
dc.date.available2024-10-28T10:11:47Z-
dc.date.issued2017-
dc.identifier.urihttp://repository.aaup.edu/jspui/handle/123456789/2900-
dc.descriptionMaster's degree in Computer Scienceen_US
dc.description.abstractAnalyzing the sentiment and opinions has become crucial, especially that institutions, governments, and private sector companies became very interested in knowing what people think about certain events or products. The data size on the web is enormous and growing rapidly. Processing and analyzing this size of data is hard and costly; therefore, existing solutions of sentiment analysis are suffering from deficiencies, such as high dimensionality and low accuracy. The process of selecting relevant features is a matter of research. The selection of relevant features that can produce high accuracy in classification is not an easy task. Therefore, the goal of this thesis is to classify text based opinions into positive and negative sentiments effectively by selecting the relevant feature subset. To solve this problem, we present an approach that utilizes machine learning and optimization evolutionary algorithms in selecting an effective feature subset in four methods. Firstly, choosing the feature subset is done based on machine learning algorithm. The support vector machine algorithm (SVM) is used to produce a weight vector after the learning process which contains values that represent the term importance for the classification process. Secondly, evolutionary algorithm (G.A) is used to optimize the feature subset generated from the first method in order to enhance the sentiment classification. V The third method hybridizes the machine learning based feature subset that is generated by the first method with a statistical based feature subset produced using correlation feature selection method. The fourth method is called optimized hybrid method, in which the optimization evolutionary (G.A) is applied on top of the feature subset that resulted from the third method. Two well-known sentiment analysis datasets that are publicly available were used to test and validate the proposed approaches. The first is polarity dataset v2.0 (D1), the second is polarity dataset v1.0 (D2), and a third dataset (D3) which is a combination of D1 and D2. Sentiment classification performance in this research is evaluated using accuracy, recall, precision, and f-measure. The results achieved in this research outperform the results reported in existing studies. In our approach in the first method using machine learning for feature weighting we achieved high accuracy reaching 98.79, and we were able to improve those results in the second method to reach higher results of 99.21 using optimization evolutionary. And when we merged the features obtained using machine learning weighting using weight by SVM with those obtained using statistical method of weight by correlation, the classification accuracy reached 99.46 which is even better than both previous methods. After improving the feature subset using the hybrid method we were able to improve the accuracy to reach 99.71 and decrease the feature subset size. Processing produces a large feature set, the results that we achieved were based on feature subsets extracted from this large set. These subsets contain the most relevant VI features for classification, and when compared to existing works the subset sizes we used were smaller which reduces the computation time required for classification.en_US
dc.publisherAAUPen_US
dc.subjectdata preprocessing,optimized nybrid methods,datasetsen_US
dc.titleOptimized Machine Learning Based Feature Selection Methods for Sentiment Classification رسالة ماجستيرen_US
dc.typeThesisen_US
Appears in Collections:Master Theses and Ph.D. Dissertations

Files in This Item:
File Description SizeFormat 
محمد بشير.pdf1.88 MBAdobe PDFThumbnail
View/Open
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Admin Tools