Methods for Analysis and Classification of Errors in Automated Tests Using Modern LLM Models
Taras Buriak , Senior Software Development Enginee Austin, Texas, USAAbstract
This article presents an overview of methods for analyzing and classifying errors in automated tests using modern language models. The research is based on a systematization of international publications that examine the solutions RCACopilot, LogLLM, FlakyDoctor, and LogGPT. It is shown that these approaches differ in their architectural solutions and task formulations: classification of incident root causes, anomaly detection in logs, repair of flaky tests, and real-time log interpretation. The study identifies specific data preparation and training strategies that determine the models' effectiveness. The presented metrics demonstrate high accuracy and practical applicability but also point to significant limitations. Among them are a dependence on monitoring infrastructure and computational resources, sensitivity to prompt parameters, and weak results in repairing NOD-type tests. The analysis showed that integrating the models into existing pipelines with filtering and validation allows for minimizing risks and increasing the reliability of the solutions. Practical implementation experience is noted, which confirmed an increase in the stability of test runs and a reduction in regression time. The article will be useful for researchers and practitioners in the fields of software engineering, automated testing, and quality assurance.
Keywords
language models, automated testing, log analysis, anomaly detection, test repair, software quality
References
Alhanahnah, M., Hasan, M. R., Xu, L., et al. (2025). An empirical evaluation of pre-trained large language models for repairing declarative formal specifications. Empirical Software Engineering, 30, 149. https://doi.org/10.1007/s10664-025-10687-1
Ardimento, P., Capuzzimati, M., Casalino, G., Schicchi, D., & Taibi, D. (2025). A novel LLM-based classifier for predicting bug-fixing time in bug tracking systems. Journal of Systems and Software, 230, 112569. https://doi.org/10.1016/j.jss.2025.112569
Boffa, M., Drago, I., Mellia, M., Vassio, L., Giordano, D., Valentim, R., & Ben Houidi, Z. (2024). LogPrécis: Unleashing language models for automated malicious log analysis: Précis: A concise summary of essential points, statements, or facts. Computers & Security, 141, 103805. https://doi.org/10.1016/j.cose.2024.103805
Chen, Y. (2024, May 23). Flakiness repair in the era of large language models. In ICSE-Companion ’24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (pp. 441–443). ACM. https://doi.org/10.1145/3639478.3641227
Chen, Y., Xie, H., Ma, M., Kang, Y., Gao, X., Shi, L., Cao, Y., Gao, X., Fan, H., Wen, M., & others. (2024, April 22). Automatic root cause analysis via large language models for cloud incidents. In EuroSys ’24: Proceedings of the Nineteenth European Conference on Computer Systems (pp. 674–688). ACM. https://doi.org/10.1145/3627703.3629553
Cui, T., Ma, S., Chen, Z., Xiao, T., Tao, S., Liu, Y., Zhang, S., Lin, D., Liu, C., Cai, Y., Meng, W., Sun, Y., & Pei, D. (2024). LogEval: A comprehensive benchmark suite for large language models in log analysis. arXiv. https://doi.org/10.48550/arXiv.2407.01896
Dakhama, A., Even-Mendoza, K., Langdon, W., et al. (2025). Enhancing search-based testing with LLMs for finding bugs in system simulators. Automated Software Engineering, 32, 63. https://doi.org/10.1007/s10515-025-00531-7
Guan, W., Cao, J., Qian, S., Gao, J., & Ouyang, C. (2025). LogLLM: Log-based anomaly detection using large language models. arXiv. https://doi.org/10.48550/arXiv.2411.08561
Kang, S., Chen, B., Yoo, S., et al. (2025). Explainable automated debugging via large language model-driven scientific debugging. Empirical Software Engineering, 30, 45. https://doi.org/10.1007/s10664-024-10594-x
Qi, J., Huang, S., Luan, Z., Fung, C., Yang, H., & Qian, D. (2023). LogGPT: Exploring ChatGPT for log-based anomaly detection. arXiv. https://doi.org/10.48550/arXiv.2309.01189
Sun, Y., Keung, J. W., Yang, Z., Liu, S., & Liao, Y. (2025). SemiSMAC: A semi-supervised framework for log anomaly detection with automated hyperparameter tuning. Information and Software Technology, 187, 107869. https://doi.org/10.1016/j.infsof.2025.107869.
Article Statistics
Downloads
Copyright License
Copyright (c) 2025 Taras Buriak

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.


Applied Sciences
| Open Access |
DOI: