Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

Mafarja, Majdi $Other$Palestinian; Thaher, Thaer$AAUP$Palestinian; Azmi Al-Betar, Mohammed $Other$Other; Too, Jingwei $Other$Other; A. Awadallah, Mohammed $Other$Other; Doush, Iyad Abu$Other$Other; Turabieh, Hamza $Other$Other

Please use this identifier to cite or link to this item: http://repository.aaup.edu/jspui/handle/123456789/1707

Title:	Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning
Authors:	Mafarja, Majdi $Other$Palestinian Thaher, Thaer$AAUP$Palestinian Azmi Al-Betar, Mohammed $Other$Other Too, Jingwei $Other$Other A. Awadallah, Mohammed $Other$Other Doush, Iyad Abu$Other$Other Turabieh, Hamza $Other$Other
Keywords:	Software fault prediction Machine learning SMOTE Dimension reduction Meta-heuristics Imbalanced data
Issue Date:	9-Feb-2023
Publisher:	Applied Intelligence / Springer
Citation:	Mafarja, M., Thaher, T., Al-Betar, M.A. et al. Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning. Appl Intell (2023). https://doi.org/10.1007/s10489-022-04427-x
Abstract:	Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling techniques are applied to make the SFP datasets ready to be used by ML techniques. Thereafter seven classifiers are compared, namely K-Nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF). The RF classifier outperforms all other classifiers in terms of eliminating irrelevant/redundant features. The performance of RF is improved further using a dimensionality reduction method called binary whale optimization algorithm (BWOA) to eliminate the irrelevant/redundant features. Finally, the performance of BWOA is enhanced by hybridizing the exploration strategies of the grey wolf optimizer (GWO) and harris hawks optimization (HHO) algorithms. The proposed method is called SBEWOA. The SFP datasets utilized are selected from the PROMISE repository using sixteen datasets for software projects with different sizes and complexity. The comparative evaluation against nine well-established feature selection methods proves that the proposed SBEWOA is able to significantly produce competitively superior results for several instances of the evaluated dataset. The algorithms’ performance is compared in terms of accuracy, the number of features, and fitness function. This is also proved by the 2-tailed P-values of the Wilcoxon signed ranks statistical test used. In conclusion, the proposed method is an efficient alternative ML method for SFP that can be used for similar problems in the software engineering domain.
URI:	http://repository.aaup.edu/jspui/handle/123456789/1707
ISSN:	https://doi.org/10.1007/s10489-022-04427-x
Appears in Collections:	Faculty & Staff Scientific Research publications

Files in This Item:

File	Description	Size	Format
paper-sample.pdf	paper_sample	171.16 kB	Adobe PDF	View/Open

Show full item record

Admin Tools

ARAB AMERICAN UNIVERSITY Repository