Please use this identifier to cite or link to this item:
http://repository.aaup.edu/jspui/handle/123456789/2984
Title: | Density-Based Approach for Information Retrieval Documents Ranking in Embedding Space رسالة ماجستير |
Authors: | Bitawi, Loai$AAUP$Palestinian |
Keywords: | business analysis,datasets,document matching problem |
Issue Date: | 2021 |
Publisher: | AAUP |
Abstract: | Search engines play an essential role in satisfying our daily life needs for information. Thus, optimizations on algorithms encapsulated in these engines are needed. Literature shows that neural networks and VSM are the best models to be used in IR systems. Moreover, the studies haven’t mentioned the use of density-based algorithms for document matching. This research explores an untapped area where the density-based algorithm is used for document-query-matching. The proposed model represents document tokens using BERT embeddings. Thus, documents are represented as clouds of points. The matching part uses a hybrid solution that enhances the recall scores and reduces runtime. The model uses the BM25 ranking model to rank the whole dataset and use only the top K documents and pass them to the density-based algorithm. The density-based model is inspired by the anomaly detection algorithm (LOF) that aims to find if a query point is an outlier to the document cloud. Since the LOF score measures the degree of outlier-ness, the points scores are used to find the score of the document to a certain query. Then, the document scores are used to rank the documents. Moreover, the density algorithm includes term weighting using TFIDF, which focuses on the most important words in the documents and reduces the effect of words that are irrelevant to the document topic. The proposed algorithm is tested against the state-of-the-art mean of embeddings model. The results show that the proposed algorithm outperforms the mean of embedding model on several datasets and testing scenarios. The algorithm shows high potential for future investigations and optimizations that would introduce a better document-query matching model for IR systems. |
Description: | Master’s degree in Data Science and Business Analytics |
URI: | http://repository.aaup.edu/jspui/handle/123456789/2984 |
Appears in Collections: | Master Theses and Ph.D. Dissertations |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
لؤي بيتاوي.pdf | 2.18 MB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Admin Tools