Please use this identifier to cite or link to this item:
http://repository.aaup.edu/jspui/handle/123456789/2605
Title: | Predicting Financial Figures using OCR and Machine Learning: Automated Extraction from Arabic Annual Reports رسالة ماجستير |
Authors: | Derieh, Batool Nidal Mohammad$AAUP$Palestinian |
Keywords: | annual reports, financial data extraction, table detection, optical character recognition, OCR, image processing, text mining, natural language processing, scanned documents, page classification, Tesseract, OpenCV |
Issue Date: | 2023 |
Publisher: | AAUP |
Abstract: | This thesis explores the application of Optical Character Recognition (OCR) and machine learning techniques to extract financial key figures from Arabic scanned annual reports and predict future net profit values using the annual reports for the Palestinian banks listed in the financial market as the data source (six banks for the years from 2012 to 2021). The key figure values were extracted using a series of preprocessing steps and a two-step rule-based extraction approach using Tesseract and OpenCV as supportive tools. The research achieves an extraction accuracy of 85% for the Net Profit key figure and 91.9 𝑅2 for net profit forecasting using the XGBoost algorithm. The thesis overcomes challenges related to Arabic script complexity, document layout, table structure, and image quality issues by leveraging OCR, Natural Language Processing (NLP), and Machine Learning (ML) technologies. The OCR approach can be used as a knowledge base for related or more complex data extraction use cases, and the trained ML models provide proof of concept for predicting net profit with a high volume of data, offering valuable insights for financial analysts, investors, and decision-makers. |
Description: | master’s degree in data science and business analytics |
URI: | http://repository.aaup.edu/jspui/handle/123456789/2605 |
Appears in Collections: | Master Theses and Ph.D. Dissertations |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
بتول دريح.pdf | 3.18 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Admin Tools