Polyp Detection Using Vision Transformers رسالة ماجستير

Alawneh, Asma$AAUP$Palestinian

Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/3527

Title:	Polyp Detection Using Vision Transformers رسالة ماجستير
Other Titles:	الكشف عن الأورام الحميدة باستخدام محوّلات الرؤية.
Authors:	Alawneh, Asma$AAUP$Palestinian
Keywords:	Artificial Intelligence, Computer Vision, Polyps.
Issue Date:	2025
Publisher:	AAUP
Abstract:	Colorectal cancer is the second leading cause of cancer deaths worldwide, and early detection and removal of polyps is crucial in lowering mortality rates. Unfortunately, traditional manual analysis of colonoscopy images is time-consuming, subjective, and can result in miss rates between 22-28%. This thesis investigates Vision Transformers as an innovative technique for semantic segmentation of polyps in colonoscopy images while comparing their performance against traditional Convolutional Neural Networks (CNNs). The implementation pipeline involved data preparation steps such as splitting and online augmentation, followed by training and tuning of the segmentation models. Five widely used datasets (Kvasir, CVC-ClinicDB, CVC-ColonDB, EndoScene, and ETIS) were utilized for training and evaluation purposes. CNN-based models (DeepLabV3 with ResNet backbone, U-Net, and their ensembles) were implemented as a baseline, and the state-of-the-art Vision Transformer models (SegFormer, UperNet with Swin Transformer backbone and their ensembles) were implemented. All implemented models were tested against both seen and unseen datasets to gain insight into their generalization ability for real-world clinical applications. iv During validation, the Ensemble_CNNs model achieved the highest mIoU of 0.8959, while the three Transformer-based models (Ensemble_Transformers, SegFormer, and Swin transformer) achieved the second, third, and fourth highest mIoU scores of 0.8777, 0.8594, and 0.8412, respectively. Testing on seen datasets further demonstrated the superiority of Vision Transformers, with Ensemble_Transformers achieving the highest mIoU scores of 0.8795 on ClinicDB and 0.8478 on Kvasir. Among all other models, Transformer-based models achieved the highest performance. Additionally, on unseen datasets, all Transformer based models displayed superior generalization ability, where Ensemble_Transformers achieved the highest mIoU scores of 0.7047 on ColonDB, 0.8314 on EndoScene, and 0.6581 on ETIS, significantly outperforming CNN-based models like U-Net (0.5718, 0.7335, 0.4889) and DeepLabV3 (0.5077, 0.7335, 0.6581). These findings demonstrate that Vision Transformer-based models are more robust and generalize better in polyp segmentation tasks in both seen and unseen datasets, making them suitable for clinical applications. By exploring an underexplored area of medical image analysis, this research advances automated diagnostic tools while contributing to the early detection and prevention of colorectal cancer.
Description:	Master \ Data Science and Business Analytics
URI:	http://repository.aaup.edu/jspui/handle/123456789/3527
Appears in Collections:	Master Theses and Ph.D. Dissertations

Files in This Item:

File	Description	Size	Format
اسماء علاونة.pdf		1.73 MB	Adobe PDF	View/Open

Show full item record

Admin Tools

ARAB AMERICAN UNIVERSITY Repository