Leveraging Explainable AI for Detecting Human-Written and  LLMs-Generated Text: Cybersecurity and Plagiarism Applications رسالة ماجستير

Najjar, Ayat Awwad Musa$AAUP$Palestinian

Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/2069

Title:	Leveraging Explainable AI for Detecting Human-Written and LLMs-Generated Text: Cybersecurity and Plagiarism Applications رسالة ماجستير
Other Titles:	الاكتشاف القائم على التعلم الالي لنصوص الأمن السيبراني المكتوبة باستخدام أداة التشات جي بي تي.
Authors:	Najjar, Ayat Awwad Musa$AAUP$Palestinian
Keywords:	ChatGPT, Claude, Google Bard, GPTZero, NLP, LLaMA, LLMs, Perplexity, Plagiarism, XAI
Issue Date:	Mar-2024
Publisher:	AAUP
Abstract:	The widespread use of Artificial Intelligence (AI)-generated texts in today's networked digital environment presents significant obstacles to academic integrity and cybersecurity. This work focuses on creating a reliable detection model for texts generated by Large Language Models (LLMs) which became well-known in Natural Language Processing (NLP). This study is divided into two parts. The first one explores the field of cybersecurity and highlights the possible risks associated with the harmful utilization of texts produced by AI. Protecting digital communication is crucial given the increase in automated social engineering attacks, fraudulent email campaigns, the spread of false information, and plagiarism-related issues with academic integrity. Our research delves into the creation of Explainable AI (XAI) techniques designed to differentiate human written from Chat Generative Pre-trained Transformers (GPT) -generated texts. We trained and evaluated different ML and Deep Learning (DL) algorithms such as Random Forest (RF), Support Vector Machine (SVM), J48, Extreme Gradient Boosting (XGBoost), Deep Neural Networks (DNN), and Convolutional Neural Networks (CNN). On the other hand, we would want to go in a fresh direction in this research for ChatGPT-based text detection. This will be accomplished by investigating to detect these documents utilizing ML and DL techniques. The results demonstrate the XGBoost reached the best accuracy of 83%. Our model outperformed GPTZero with an accuracy of 40% as opposed to 38%, however, GPTZero was unable to identify 20 observations from the test dataset whereas our model was able to identify the complete test dataset. In addition, the second part of the study looks into the domain of academic integrity, emphasizing the obstacles faced by students who depend on LLMs for their assignments. Since traditional plagiarism detection technologies compare text produced by LLMs to pre-existing online information, they frequently fail to recognize text created by LLMs. Stressing the value of VI developing writing skills in educational contexts, we examined the possible impact of excessive dependence on artificial intelligence tools, imagining a generation of students lacking important expressive skills. For this study, we trained and evaluated different ML and Deep Learning (DL) algorithms such as RF, XGBoost, and Recurrent Neural Networks (RNN). On the other hand, we would want to go in a fresh direction in this research for LLMs-based plagiarism detection. This will be accomplished by investigating to detect these documents utilizing ML and DL techniques and XAI. In this study, we created an ML model that can identify texts produced by LLM tools to detect plagiarism. This was achieved using ML and XAI techniques in two folds. The first one is a multi-classifier, where we will differentiate between five different LLM tools (ChatGPT, LLaMA, Google Bard, Claude, and Perplexity) and human-written text. In this fold, we found that the RF gives the best result in this section with 97% accuracy. The other fold is a binary classification, where we distinguished between text generated by LLMs generally and text written by humans and the three algorithms RF, XGBoost and RNN gave 100% accuracy. Also, our model outperformed GPTZero with 100% TP (True Positive). Notably, GPTZero was unable to identify 20 observations from the test dataset whereas our model, again, was able to identify the complete test dataset.
Description:	Master’s degree in Information Security
URI:	http://repository.aaup.edu/jspui/handle/123456789/2069
Appears in Collections:	Master Theses and Ph.D. Dissertations

Files in This Item:

File	Description	Size	Format
ايات نجار.pdf	Master’s degree in Information Security	2.62 MB	Adobe PDF	View/Open

Show full item record

Admin Tools

ARAB AMERICAN UNIVERSITY Repository