EVALUATING MACHINE LEARNING ALGORITHMS FOR BREAST CANCER DETECTION: A STUDY ON ACCURACY AND PREDICTIVE PERFORMANCE
Md Al-Imran , College of Graduate and Professional Studies Trine University, USA Salma Akter , Department of Public Administration, Gannon University, Erie, PA, USA Md Abu Sufian Mozumder , College of Business, Westcliff University, Irvine, California, USA Rowsan Jahan Bhuiyan , Master of Science in Information Technology, Washington University of Science and Technology, USA Tauhedur Rahman , Dahlkemper School of Business, Gannon University, USA Md Jamil Ahmmed , Department of Information Technology Project Management, Business Analytics, St. Francis College, USA Md Nazmul Hossain Mir , Master of Science in Information Technology, Washington University of Science and Technology, USA Md Amit Hasan , Master of Science in Information Technology, Washington University of Science and Technology, USA Ashim Chandra Das , Master of Science in Information Technology, Washington University of Science and Technology, USA Md. Emran Hossen , Department of Science in Biomedical Engineering, Gannon University, USAAbstract
This study evaluates several machine learning algorithms—Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree (C4.5), and k-Nearest Neighbors (KNN)—for breast cancer detection using the Breast Cancer Wisconsin Diagnostic dataset. We implemented comprehensive pre-processing and model evaluation with Scikit-learn in Python. Our findings show that SVM achieved the highest accuracy, with 99.9% on the training set and 98.50% on the testing set, indicating superior performance in handling high-dimensional data. Random Forest also performed well, with accuracies of 98.5% and 98.20%, respectively. Logistic Regression and Decision Tree models provided reliable predictions when tuned, while KNN was less effective. SVM and Random Forest are recommended for clinical decision support systems due to their high accuracy and robustness.
Keywords
Accuracy rates, Performance analysis, Confusion matrix
References
Naji, M. A., El Filali, S., Aarika, K., Benlahmar, E. H., Abdelouhahid, R. A., & Debauche, O. (2021). Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science, 191, 487-492.
American Cancer Society. (2023). Breast cancer. Retrieved from https://www.cancer.org/cancer/breast-cancer.html
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., & Blau, H. M. (2019). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118. https://doi.org/10.1038/nature21056
Huang, C., Zhou, P., Liu, M., & Zhang, Y. (2021). Machine learning algorithms for predicting breast cancer: A systematic review. Journal of Cancer Research and Clinical Oncology, 147(6), 1557-1573. https://doi.org/10.1007/s00432-020-03428-2
Wolberg, W. H., Street, W. N., & Mangasarian, O. L. (1995). Machine learning techniques to diagnose breast cancer from DNA microarray data. Journal of Biomedical Informatics, 28(6), 477-486. https://doi.org/10.1006/jbin.1995.1036
Zhang, H., Zhang, X., & Wang, J. (2020). A comprehensive review on machine learning algorithms for medical data classification. Computers in Biology and Medicine, 122, 103787. https://doi.org/10.1016/j.compbiomed.2020.103787
Khan, R. H., Miah, J., Nipun, S. A. A., & Islam, M. (2023, March). A Comparative Study of Machine Learning classifiers to analyze the Precision of Myocardial Infarction prediction. In 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0949-0954). IEEE.
Fatima, N., Liu, L., Hong, S., & Ahmed, H. (2020). Prediction of breast cancer, comparative review of machine learning techniques, and their analysis. IEEE Access, 8, 150360-150376.
Uddin, K. M. M., Biswas, N., Rikta, S. T., & Dey, S. K. (2023). Machine learning-based diagnosis of breast cancer utilizing feature optimization technique. Computer Methods and Programs in Biomedicine Update, 3, 100098.
S. Kayyum et al., "Data Analysis on Myocardial Infarction with the help of Machine Learning Algorithms considering Distinctive or Non-Distinctive Features," 2020 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2020, pp. 1-7, doi: 10.1109/ICCCI48352.2020.9104104.
Elsadig, M. A., Altigani, A., & Elshoush, H. T. (2023). Breast cancer detection using machine learning approaches: a comparative study. International Journal of Electrical & Computer Engineering (2088-8708), 13(1).
Hasan, M., Pathan, M. K. M., & Kabir, M. F. (2024). Functionalized Mesoporous Silica Nanoparticles as Potential Drug Delivery Vehicle against Colorectal Cancer. Journal of Medical and Health Studies, 5(3), 56-62.
Hasan, M., Kabir, M. F., & Pathan, M. K. M. (2024). PEGylation of Mesoporous Silica Nanoparticles for Drug Delivery Applications. Journal of Chemistry Studies, 3(2), 01-06.
Hasan, M., & Mahama, M. T. (2024). Uncovering the complex mechanisms behind nanomaterials-based plasmon-driven photocatalysis through the utilization of Surface-Enhanced Raman Spectroscopies. arXiv preprint arXiv:2408.13927.
Arif, M., Hasan, M., Al Shiam, S. A., Ahmed, M. P., Tusher, M. I., Hossan, M. Z., ... & Imam, T. (2024). Predicting Customer Sentiment in Social Media Interactions: Analyzing Amazon Help Twitter Conversations Using Machine Learning. International Journal of Advanced Science Computing and Engineering, 6(2), 52-56.
Khan, R. H., Miah, J., Rahman, M. M., & Tayaba, M. (2023, March). A comparative study of machine learning algorithms for detecting breast cancer. In 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 647-652). IEEE.
Miah, J., Khan, R. H., Ahmed, S., & Mahmud, M. I. (2023, June). A comparative study of detecting covid 19 by using chest X-ray images–A deep learning approach. In 2023 IEEE World AI IoT Congress (AIIoT) (pp. 0311-0316). IEEE.
Khan, R. H., & Miah, J. (2022, June). Performance Evaluation of a new one-time password (OTP) scheme using stochastic petri net (SPN). In 2022 IEEE World AI IoT Congress (AIIoT) (pp. 407-412). IEEE.
Khan, R. H., Miah, J., Arafat, S. Y., Syeed, M. M., & Ca, D. M. (2023, November). Improving Traffic Density Forecasting in Intelligent Transportation Systems Using Gated Graph Neural Networks. In 2023 15th International Conference on Innovations in Information Technology (IIT) (pp. 104-109). IEEE.
Miah, J., Ca, D. M., Sayed, M. A., Lipu, E. R., Mahmud, F., & Arafat, S. Y. (2023, November). Improving Cardiovascular Disease Prediction Through Comparative Analysis of Machine Learning Models: A Case Study on Myocardial Infarction. In 2023 15th International Conference on Innovations in Information Technology (IIT) (pp. 49-54). IEEE.
R. H. Khan, J. Miah, M. A. R. Rahat, A. H. Ahmed, M. A. Shahriyar and E. R. Lipu, "A Comparative Analysis of Machine Learning Approaches for Chronic Kidney Disease Detection," 2023 8th International Conference on Electrical, Electronics and Information Engineering (ICEEIE), Malang City, Indonesia, 2023, pp. 1-6, doi: 10.1109/ICEEIE59078.2023.10334765.
Rahman, M. M., Islam, A. M., Miah, J., Ahmad, S., & Hasan, M. M. (2023, June). Empirical Analysis with Component Decomposition Methods for Cervical Cancer Risk Assessment. In 2023 IEEE World AI IoT Congress (AIIoT) (pp. 0513-0519). IEEE.
Article Statistics
Copyright License
Copyright (c) 2024 Md Al-Imran, Salma Akter, Md Abu Sufian Mozumder, Rowsan Jahan Bhuiyan, Md Al Rafi, Md Shahriar Mahmud Bhuiyan, Gourab Nicholas Rodrigues, Md Nazmul Hossain Mir, Md Amit Hasan, Ashim Chandra Das, Md. Emran Hossen
This work is licensed under a Creative Commons Attribution 4.0 International License.