Applied Sciences | Open Access | DOI: https://doi.org/10.37547/tajas/Volume07Issue10-11

Methods for Analysis and Classification of Errors in Automated Tests Using Modern LLM Models

Taras Buriak , Senior Software Development Enginee Austin, Texas, USA

Abstract

This article presents an overview of methods for analyzing and classifying errors in automated tests using modern language models. The research is based on a systematization of international publications that examine the solutions RCACopilot, LogLLM, FlakyDoctor, and LogGPT. It is shown that these approaches differ in their architectural solutions and task formulations: classification of incident root causes, anomaly detection in logs, repair of flaky tests, and real-time log interpretation. The study identifies specific data preparation and training strategies that determine the models' effectiveness. The presented metrics demonstrate high accuracy and practical applicability but also point to significant limitations. Among them are a dependence on monitoring infrastructure and computational resources, sensitivity to prompt parameters, and weak results in repairing NOD-type tests. The analysis showed that integrating the models into existing pipelines with filtering and validation allows for minimizing risks and increasing the reliability of the solutions. Practical implementation experience is noted, which confirmed an increase in the stability of test runs and a reduction in regression time. The article will be useful for researchers and practitioners in the fields of software engineering, automated testing, and quality assurance.

Keywords

language models, automated testing, log analysis, anomaly detection, test repair, software quality

References

Alhanahnah, M., Hasan, M. R., Xu, L., et al. (2025). An empirical evaluation of pre-trained large language models for repairing declarative formal specifications. Empirical Software Engineering, 30, 149. https://doi.org/10.1007/s10664-025-10687-1

Ardimento, P., Capuzzimati, M., Casalino, G., Schicchi, D., & Taibi, D. (2025). A novel LLM-based classifier for predicting bug-fixing time in bug tracking systems. Journal of Systems and Software, 230, 112569. https://doi.org/10.1016/j.jss.2025.112569

Boffa, M., Drago, I., Mellia, M., Vassio, L., Giordano, D., Valentim, R., & Ben Houidi, Z. (2024). LogPrécis: Unleashing language models for automated malicious log analysis: Précis: A concise summary of essential points, statements, or facts. Computers & Security, 141, 103805. https://doi.org/10.1016/j.cose.2024.103805

Chen, Y. (2024, May 23). Flakiness repair in the era of large language models. In ICSE-Companion ’24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (pp. 441–443). ACM. https://doi.org/10.1145/3639478.3641227

Chen, Y., Xie, H., Ma, M., Kang, Y., Gao, X., Shi, L., Cao, Y., Gao, X., Fan, H., Wen, M., & others. (2024, April 22). Automatic root cause analysis via large language models for cloud incidents. In EuroSys ’24: Proceedings of the Nineteenth European Conference on Computer Systems (pp. 674–688). ACM. https://doi.org/10.1145/3627703.3629553

Cui, T., Ma, S., Chen, Z., Xiao, T., Tao, S., Liu, Y., Zhang, S., Lin, D., Liu, C., Cai, Y., Meng, W., Sun, Y., & Pei, D. (2024). LogEval: A comprehensive benchmark suite for large language models in log analysis. arXiv. https://doi.org/10.48550/arXiv.2407.01896

Dakhama, A., Even-Mendoza, K., Langdon, W., et al. (2025). Enhancing search-based testing with LLMs for finding bugs in system simulators. Automated Software Engineering, 32, 63. https://doi.org/10.1007/s10515-025-00531-7

Guan, W., Cao, J., Qian, S., Gao, J., & Ouyang, C. (2025). LogLLM: Log-based anomaly detection using large language models. arXiv. https://doi.org/10.48550/arXiv.2411.08561

Kang, S., Chen, B., Yoo, S., et al. (2025). Explainable automated debugging via large language model-driven scientific debugging. Empirical Software Engineering, 30, 45. https://doi.org/10.1007/s10664-024-10594-x

Qi, J., Huang, S., Luan, Z., Fung, C., Yang, H., & Qian, D. (2023). LogGPT: Exploring ChatGPT for log-based anomaly detection. arXiv. https://doi.org/10.48550/arXiv.2309.01189

Sun, Y., Keung, J. W., Yang, Z., Liu, S., & Liao, Y. (2025). SemiSMAC: A semi-supervised framework for log anomaly detection with automated hyperparameter tuning. Information and Software Technology, 187, 107869. https://doi.org/10.1016/j.infsof.2025.107869.‏

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Taras Buriak. (2025). Methods for Analysis and Classification of Errors in Automated Tests Using Modern LLM Models. The American Journal of Applied Sciences, 7(10), 96–103. https://doi.org/10.37547/tajas/Volume07Issue10-11