International Traffic Bypass Detection in Telecom Using Machine Learning and Resampling Techniques on Imbalanced Data رسالة ماجستير

AbuNaeem, Wael$AAUP$Palestinian

Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/3793

Title:	International Traffic Bypass Detection in Telecom Using Machine Learning and Resampling Techniques on Imbalanced Data رسالة ماجستير
Other Titles:	الكشف عن التجاوزات في اجراء المكالمات الدوليه في قطاع الاتصالات باستخدام تقنيات التعلم الالي وتقنيه موازنه العينات على البيانات غير المتوازنه.
Authors:	AbuNaeem, Wael$AAUP$Palestinian
Keywords:	Telecom Fraud, International Traffic Bypass, SIMBOX Detection, Ma chine Learning, Support Vector Machine (SVM), Random Forest Classifier (RFC), Data Imbalance, Resampling, Oversampling, Event Detailed Records (EDRs)
Issue Date:	2026
Publisher:	AAUP
Abstract:	Various fraud types are considered major threats to telecom operators due to the huge amount of revenue loss they cause and their effect on the credibility, customer satisfac tion, and performance of telecom operators. One major type of fraud that is the subject of our study is International Traffic Bypass fraud, also known as SIMBOX, SIMBOX is a common and widely used type of fraud in the telecom industry that causes losses of bil lions of dollars yearly Howell, 2021. In this type of fraud, fraudsters usually use a special device to bypass international traffic and gain revenues on the account of telecom opera tors. The traditional approaches used for SIMBOX fraud detection is by using an expert system with a predefined set of fixed rules and statistical analysis conducted on traffic Event Detailed Record (EDRs) from source systems mainly Mobile Switch Center (MSC) during a given period, based on system rules and traffic analysis results including the nor mal calling hours, call duration, differently called numbers, different locations, etc., the subscriber will be classified as a suspected fraudster or not. The traditional approach in SIMBOX detection requires a continuous update of the rules and generates false positive cases frequently. This necessitates the need to develop intelligent and efficient models for detection using Machine Learning (ML) algorithms. The intricacies of employing ma chine learning techniques for SIMBOX detection were investigated in this research, with a primary focus on the application of Support Vector Machine (SVM) and Random Forest Classifier (RFC). A real telecom dataset was used for this purpose composed of of 23,017 cases of which 22,289 normal cases and 728 fraudulent cases. That dataset was generated from big raw data storage of millions of records. A comparison of classification accuracy was conducted both before and after the implementation of various resampling techniques. These strategies were utilized to address the prevalent issue of imbalanced datasets com mon in this domain. Ultimately, the most effective approach for managing data imbalance was identified through this evaluative process. The results show that oversampling is the best approach to solve the imbalance issue in both RF and SVM implementations, SVM ac curacy is slightly better than RF, SVM gave an accuracy of 99.71% while RF gave 99.66%, SVM has a lower FPR and it takes three times less time than RF for model tuning, training, and testing
Description:	Master \ Data Science and Business Analytics
URI:	http://repository.aaup.edu/jspui/handle/123456789/3793
Appears in Collections:	Master Theses and Ph.D. Dissertations

Files in This Item:

File	Description	Size	Format
وائل ابو نعيم.pdf		8.26 MB	Adobe PDF	View/Open

Show full item record

Admin Tools

ARAB AMERICAN UNIVERSITY Repository