Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/2646
Title: Multi-Channel Classifier for Analyzing External Influences and Factors on Cryptocurrency Price Using Machine Learning Techniques رسالة ماجستير
Authors: Hamayel, Mohammad Joudeh Atallah$AAUP$Palestinian
Keywords: Cryptocurrency introduction,Cryptocurrency price prediction,Data fusion,
Issue Date: 2023
Publisher: AAUP
Abstract: Trading in cryptocurrency markets is one of the most resources that the investor exploits to get a lot of profits. There are multiple techniques used such as speculation by exploiting the fluctuation in a cryptocurrency price to reach their goal. This investment is very risky so investors (players) are very careful when deciding to buy or sell in cryptocurrency markets. These types of markets attract attention from researchers, investors, and humans to study or invest or to make challenges, so they study the cryptocurrency markets to understand their behavior and nature. This thesis aims to investigate the internal and external effects on cryptocurrency markets to predict the price. In general, the factors and influences are social media (Twitter), news articles, cost of production, demand, supply, and gold price. All of those are used to create a comprehensive, robust, and reliable environment and then extract the best feature from each influence (usually called a channel) and the best feature usually called the improved chromosome that forms the model. To achieve this goal, this thesis targeted four types of cryptocurrency that are Bitcoin (BTC), Ethereum (ETH), Litecoin (LTC), and Dogecoin (DOGE). These cryptocurrencies are considered the most popular cryptocurrencies, and all of these have experimented with using the same channels and conditions. In addition to Twitter and news articles channels; for those I used two types of sentiment analysis namely; the VADER sentiment analysis and the Harvard IV-4. The outcomes indicate that the VADER lexicon and rule-based sentiment analysis tool are achieving a better result than Harvard IV-4 for the Twitter channel and the opposite for the news vi articles channel. Therefore, the outcomes using VADER sentiment analysis represented the Twitter channel, and the outcome from Harvard IV-4 represented the news articles channels. The data collection and preparation process was the most difficult and complex stage in this thesis. For example, there are more than 220 Million records collected from Twitter to represent the Twitter channel, on the other hand, interacting with this volume of data represents a big challenge. Data preprocessing for text was processed using multiple stages of processing such as removing duplication, null values processing, removing stop words, removing punctuations, and data completion to achieve this long and complex process. As a result, I have proposed a process called Divide, Clean, and Combine method (DCC). Moreover, interacting with non-English Tweets shaped is another challenge from a sequence of challenges. To interact with non-English Tweets, I proposed a translation process that is complex and takes a long time of preprocessing. Although the translation process may be affected negatively in some scenarios; however, the results indicate that the model created by the Twitter channel with English and non-English (translated tweets) presents better accuracy than that the model depends on English tweets only. The proposed technique that the thesis depends on is a data fusion technique. The technique aims to collect the data from multiple data sources and choose the represented feature from each channel to be fusion as a single dataset that is used to train and test the model by Long Short-Term Memory (LSTM). The simulation results show that the fusion techniques enhance the result rather than using a single channel. The evaluation was conducted by using mean absolute squared error vii (MASE). In general, for all cryptocurrencies targeted in this thesis, the result was satisfactory for each targeted cryptocurrency, and the final result was 5.62%, 3.48%, and 2.88% for BTC, ETH, and LTC respectively, and for DOGE there were no effects for data fusion techniques in the result. In future work, I will implement more algorithms and do a comparison between the targeted algorithms and try to get more channels, study the effects of these channels in the proposed model in this thesis model, and train the model on newly collected data
Description: Master`s degree in Data Science and Business Analytics
URI: http://repository.aaup.edu/jspui/handle/123456789/2646
Appears in Collections:Master Theses and Ph.D. Dissertations

Files in This Item:
File Description SizeFormat 
محمد حمايل.pdf6.01 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Admin Tools