Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/2941
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAshqar, Huthaifa$AAUP$Palestinian-
dc.contributor.authorAl-Hadidi, Taqwa$Other$Other-
dc.contributor.authorElhenawy, Mohammed$Other$Other-
dc.contributor.authorKhanfar, Nour$AAUP$Palestinian-
dc.date.accessioned2024-11-05T12:35:21Z-
dc.date.available2024-11-05T12:35:21Z-
dc.date.issued2024-10-10-
dc.identifier.citationAshqar, H.I.; Alhadidi, T.I.; Elhenawy, M.; Khanfar, N.O. Leveraging Multimodal Large Language Models (MLLMs) for Enhanced Object Detection and Scene Understanding in Thermal Images for Autonomous Driving Systems. Automation 2024, 5, 508–526. https:// doi.org/10.3390/automation5040029en_US
dc.identifier.issnhttps:// doi.org/10.3390/automation5040029-
dc.identifier.urihttp://repository.aaup.edu/jspui/handle/123456789/2941-
dc.description.abstractThe integration of thermal imaging data with multimodal large language models (MLLMs) offers promising advancements for enhancing the safety and functionality of autonomous driving systems (ADS) and intelligent transportation systems (ITS). This study investigates the potential of MLLMs, specifically GPT-4 Vision Preview and Gemini 1.0 Pro Vision, for interpreting thermal images for applications in ADS and ITS. Two primary research questions are addressed: the capacity of these models to detect and enumerate objects within thermal images, and to determine whether pairs of image sources represent the same scene. Furthermore, we propose a framework for object detection and classification by integrating infrared (IR) and RGB images of the same scene without requiring localization data. This framework is particularly valuable for enhancing the detection and classification accuracy in environments where both IR and RGB cameras are essential. By employing zero-shot in-context learning for object detection and the chain-of-thought technique for scene discernment, this study demonstrates that MLLMs can recognize objects such as vehicles and individuals with promising results, even in the challenging domain of thermal imaging. The results indicate a high true positive rate for larger objects and moderate success in scene discernment, with a recall of 0.91 and a precision of 0.79 for similar scenes. The integration of IR and RGB images further enhances detection capabilities, achieving an average precision of 0.93 and an average recall of 0.56. This approach leverages the complementary strengths of each modality to compensate for individual limitations. This study highlights the potential of combining advanced AI methodologies with thermal imaging to enhance the accuracy and reliability of ADS, while identifying areas for improvement in model performance.en_US
dc.language.isoen_USen_US
dc.publisherMDPIen_US
dc.titleLeveraging Multimodal Large Language Models (MLLMs) for Enhanced Object Detection and Scene Understanding in Thermal Images for Autonomous Driving Systemsen_US
dc.typeArticleen_US
Appears in Collections:Faculty & Staff Scientific Research publications

Files in This Item:
File Description SizeFormat 
automation-05-00029.pdf8.95 MBAdobe PDFThumbnail
View/Open
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Admin Tools