COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR PREDICTING CYBERSECURITY ATTACK SUCCESS: A PERFORMANCE EVALUATION
Md Abu Sayed , Department of Professional Security Studies, New Jersey City University, Jersey City, New Jersey, USA Badruddowza , Department of Computer & Info Science, Gannon University, Erie, Pennsylvania, USA Md Shohail Uddin Sarker , Department of Computer & Info Science, Gannon University, Erie, Pennsylvania, USA Abdullah Al Mamun , Department of Computer & Info Science, Gannon University, Erie, Pennsylvania, USA Norun Nabi , Master of Science in Information Technology (MSIT)- Washington University of Science and Technology (WUST), Alexandria, Virginia, USA Fuad Mahmud , Department of Information Assurance and Cybersecurity, Gannon University, USA Md Khorshed Alam , Department of Professional Security Studies, New Jersey City University, Jersey City, New Jersey, USA Md Tarek Hasan , Department of Professional Security Studies, New Jersey City University, Jersey City, New Jersey, USA Md Rashed Buiya , Department of Computer Science, California State University, Dominguez Hills, USA Mashaeikh Zaman Md. Eftakhar Choudhury , Master of Social Science in Security Studies, Bangladesh University of Professional (BUP), Dhaka, BangladeshAbstract
This study explores the effectiveness of various machine learning algorithms in predicting the success of cybersecurity attacks by analyzing historical attack data. We evaluated five prominent algorithms—Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting (GB), and K-Nearest Neighbors (KNN)—based on their performance metrics, including accuracy, precision, recall, F1-Score, and AUC-ROC. Our results indicate that Random Forest outperforms the other algorithms, achieving the highest accuracy (90%), precision (88%), recall (85%), F1-Score (86%), and AUC-ROC (0.92). Gradient Boosting also demonstrated strong performance with an accuracy of 88% and an AUC-ROC of 0.90, though it required more computational resources. Logistic Regression and SVM provided moderate results, while K-Nearest Neighbors showed the least effectiveness due to its lower performance metrics. The comparative analysis highlights Random Forest as the most effective model for predicting cybersecurity attack success, offering superior performance in handling complex data and distinguishing between attack outcomes. These findings provide valuable insights for improving cybersecurity strategies and selecting appropriate machine-learning models for threat prediction.
Keywords
Machine Learning, Cybersecurity, Random Forest
References
Cover, T. M., & Hart, P. E. (1967). Nearest-neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. https://doi.org/10.1109/TIT.1967.1053964
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. https://doi.org/10.1007/BF00994018
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451
Li, Y., Xu, Y., & Zhang, Q. (2020). Machine learning for cybersecurity: A survey. Journal of Computer Science and Technology, 35(1), 8-20. https://doi.org/10.1007/s11390-019-1917-8
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22. https://CRAN.R-project.org/doc/Rnews/
Menard, S. (2021). Applied logistic regression analysis (3rd ed.). Sage Publications.
NIST. (2021). NIST cybersecurity framework. National Institute of Standards and Technology. https://www.nist.gov/cyberframework
Sommer, R., & Paxson, V. (2019). The role of machine learning in cybersecurity. ACMComputing Surveys, 52*(3), 1-36. https://doi.org/10.1145/3287320
Chen, T., Song, L., & He, X. (2021). Gradient boosting for cybersecurity. Journal of Cybersecurity, 7(4), 122-134. https://doi.org/10.1016/j.jocs.2021.100056
Gao, Y., Han, X., & Zhang, Q. (2018). Application of logistic regression in cybersecurity. IEEE Access, 6, 7891-7900. https://doi.org/10.1109/ACCESS.2018.2883235
Kang, M., Kim, Y., & Lee, S. (2019). Random Forest algorithm for threat prediction. Computers & Security, 86, 261-275. https://doi.org/10.1016/j.cose.2019.05.002
Liu, X., Wang, L., & Zhang, Y. (2020). Support vector machine-based detection of network attacks. Information Sciences, 512, 560-572. https://doi.org/10.1016/j.ins.2019.10.032
Zhou, Z., Yu, Z., & Zhang, J. (2017). K-Nearest Neighbors in network security applications. Future Generation Computer Systems, 75, 68-80. https://doi.org/10.1016/j.future.2017.02.010
Mozumder, M. A. S., Sweet, M. M. R., Nabi, N., Tusher, M. I., Modak, C., Hasan, M., ... & Prabha, M. (2024). Revolutionizing Organizational Decision-Making for Banking Sector: A Machine Learning Approach with CNNs in Business Intelligence and Management. Journal of Business and Management Studies, 6(3), 111-118.
Bhuiyan, M. S., Chowdhury, I. K., Haider, M., Jisan, A. H., Jewel, R. M., Shahid, R., ... & Siddiqua, C. U. (2024). Advancements in early detection of lung cancer in public health: a comprehensive study utilizing machine learning algorithms and predictive models. Journal of Computer Science and Technology Studies, 6(1), 113-121.
Nabi, N., Tusher, M. I., Modak, C., Hasan, M., ... & Prabha, M. (2024). Revolutionizing Organizational Decision-Making for Banking Sector: A Machine Learning Approach with CNNs in Business Intelligence and Management. Journal of Business and Management Studies, 6(3), 111-118.
Rahman, M. A., Modak, C., Mozumder, M. A. S., Miah, M. N. I., Hasan, M., Sweet, M. M. R., ... & Alam, M. (2024). Advancements in Retail Price Optimization: Leveraging Machine Learning Models for Profitability and Competitiveness. Journal of Business and Management Studies, 6(3), 103-110.
Shahid, R., Mozumder, M. A. S., Sweet, M. M. R., Hasan, M., Alam, M., Rahman, M. A., ... & Islam, M. R. (2024). Predicting Customer Loyalty in the Airline Industry: A Machine Learning Approach Integrating Sentiment Analysis and User Experience. International Journal on Computational Engineering, 1(2), 50-54.
Modak, C., Ghosh, S. K., Sarkar, M. A. I., Sharif, M. K., Arif, M., Bhuiyan, M., ... & Devi, S. (2024). Machine Learning Model in Digital Marketing Strategies for Customer Behavior: Harnessing CNNs for Enhanced Customer Satisfaction and Strategic Decision-Making. Journal of Economics, Finance and Accounting Studies, 6(3), 178-186.
Mozumder, M. A. S., Nguyen, T. N., Devi, S., Arif, M., Ahmed, M. P., Ahmed, E., ... & Uddin, A. (2024). Enhancing Customer Satisfaction Analysis Using Advanced Machine Learning Techniques in Fintech Industry. Journal of Computer Science and Technology Studies, 6(3), 35-41.
Arif, M., Hasan, M., Al Shiam, S. A., Ahmed, M. P., Tusher, M. I., Hossan, M. Z., ... & Imam, T. (2024). Predicting Customer Sentiment in Social Media Interactions: Analyzing Amazon Help Twitter Conversations Using Machine Learning. International Journal of Advanced Science Computing and Engineering, 6(2), 52-56.
Md Al-Imran, Salma Akter, Md Abu Sufian Mozumder, Rowsan Jahan Bhuiyan, Md Al Rafi, Md Shahriar Mahmud Bhuiyan, Gourab Nicholas Rodrigues, Md Nazmul Hossain Mir, Md Amit Hasan, Ashim Chandra Das, & Md. Emran Hossen. (2024). EVALUATING MACHINE LEARNING ALGORITHMS FOR BREAST CANCER DETECTION: A STUDY ON ACCURACY AND PREDICTIVE PERFORMANCE. The American Journal of Engineering and Technology, 6(09), 22–33. https://doi.org/10.37547/tajet/Volume06Issue09-04
Md Abu Sufian Mozumder, Fuad Mahmud, Md Shujan Shak, Nasrin Sultana, Gourab Nicholas Rodrigues, Md Al Rafi, Md Zahidur Rahman Farazi, Md Razaul Karim, Md. Sayham Khan, & Md Shahriar Mahmud Bhuiyan. (2024). Optimizing Customer Segmentation in the Banking Sector: A Comparative Analysis of Machine Learning Algorithms. Journal of Computer Science and Technology Studies, 6(4), 01–07. https://doi.org/10.32996/jcsts.2024.6.4.1
Article Statistics
Copyright License
Copyright (c) 2024 Md Abu Sayed, Badruddowza, Md Shohail Uddin Sarker, Abdullah Al Mamun, Norun Nabi, Fuad Mahmud, Md Khorshed Alam, Md Tarek Hasan, Md Rashed Buiya, Mashaeikh Zaman Md. Eftakhar Choudhury
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.