Please use this identifier to cite or link to this item:
http://repository.aaup.edu/jspui/handle/123456789/3793| Title: | International Traffic Bypass Detection in Telecom Using Machine Learning and Resampling Techniques on Imbalanced Data رسالة ماجستير |
| Other Titles: | الكشف عن التجاوزات في اجراء المكالمات الدوليه في قطاع الاتصالات باستخدام تقنيات التعلم الالي وتقنيه موازنه العينات على البيانات غير المتوازنه. |
| Authors: | AbuNaeem, Wael$AAUP$Palestinian |
| Keywords: | Telecom Fraud, International Traffic Bypass, SIMBOX Detection, Ma chine Learning, Support Vector Machine (SVM), Random Forest Classifier (RFC), Data Imbalance, Resampling, Oversampling, Event Detailed Records (EDRs) |
| Issue Date: | 2026 |
| Publisher: | AAUP |
| Abstract: | Various fraud types are considered major threats to telecom operators due to the huge amount of revenue loss they cause and their effect on the credibility, customer satisfac tion, and performance of telecom operators. One major type of fraud that is the subject of our study is International Traffic Bypass fraud, also known as SIMBOX, SIMBOX is a common and widely used type of fraud in the telecom industry that causes losses of bil lions of dollars yearly Howell, 2021. In this type of fraud, fraudsters usually use a special device to bypass international traffic and gain revenues on the account of telecom opera tors. The traditional approaches used for SIMBOX fraud detection is by using an expert system with a predefined set of fixed rules and statistical analysis conducted on traffic Event Detailed Record (EDRs) from source systems mainly Mobile Switch Center (MSC) during a given period, based on system rules and traffic analysis results including the nor mal calling hours, call duration, differently called numbers, different locations, etc., the subscriber will be classified as a suspected fraudster or not. The traditional approach in SIMBOX detection requires a continuous update of the rules and generates false positive cases frequently. This necessitates the need to develop intelligent and efficient models for detection using Machine Learning (ML) algorithms. The intricacies of employing ma chine learning techniques for SIMBOX detection were investigated in this research, with a primary focus on the application of Support Vector Machine (SVM) and Random Forest Classifier (RFC). A real telecom dataset was used for this purpose composed of of 23,017 cases of which 22,289 normal cases and 728 fraudulent cases. That dataset was generated from big raw data storage of millions of records. A comparison of classification accuracy was conducted both before and after the implementation of various resampling techniques. These strategies were utilized to address the prevalent issue of imbalanced datasets com mon in this domain. Ultimately, the most effective approach for managing data imbalance was identified through this evaluative process. The results show that oversampling is the best approach to solve the imbalance issue in both RF and SVM implementations, SVM ac curacy is slightly better than RF, SVM gave an accuracy of 99.71% while RF gave 99.66%, SVM has a lower FPR and it takes three times less time than RF for model tuning, training, and testing |
| Description: | Master \ Data Science and Business Analytics |
| URI: | http://repository.aaup.edu/jspui/handle/123456789/3793 |
| Appears in Collections: | Master Theses and Ph.D. Dissertations |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| وائل ابو نعيم.pdf | 8.26 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Admin Tools