Engineering and Technology
| Open Access | Adaptive and Trustworthy Software Testing in the Era of Large Language Models: Frameworks, Empirical Insights, and Future Directions
Sanjay R. Kapoor , University of AucklandAbstract
This article presents an integrative, comprehensive, and forward-looking examination of software testing strategies in the context of rapid methodological and technological shifts, particularly the emergence of large language models (LLMs) and serverless architectures. Grounded strictly in the supplied literature, the paper synthesizes empirical findings, theoretical frameworks, and practical considerations to articulate a cohesive research narrative that spans four decades of testing evolution, contemporary automation frameworks, and emerging evaluation challenges introduced by LLM-assisted development and testing. The abstract summarizes the study's motivation, approach, key findings, and implications. Motivation: software testing remains a decisive factor in software quality assurance amid rapidly changing development paradigms and new testing affordances driven by AI (Gurcan et al., 2022; Wang et al., 2024). Approach: the paper undertakes a methodical scholarly integration of domain surveys, empirical studies on developer behavior and job profiles, and recent investigations into LLMs and serverless testing to build a conceptual and practical framework for adaptive testing. Key findings: (1) testing strategies have evolved from predominantly manual, artefact-centric approaches to hybrid, automation-enabled frameworks with strong semantic and traceability emphases (Ricca & Tonella, 2001; Andrews et al., 2005; Gurcan et al., 2022); (2) organizational and industrial constraints shape which testing practices are adoptable and sustainable, and job profile analyses reveal a strong gap between academic proposals and industrial adoption (Kassab et al., 2021; Alshahwan et al., 2023); (3) serverless architectures and LLMs introduce new testing vectors and complexity, including ephemeral execution contexts and probabilistic output behaviors that require novel testing heuristics and evaluation metrics (De Silva & Hewawasam, 2024; Wang et al., 2024); and (4) empirical studies on how developers craft tests provide actionable micro-level insights into the cognitive and social processes underpinning test design (Aniche et al., 2021; Pudlitz et al., 2020). Implications: the article proposes a layered, traceability-first testing framework that integrates lightweight requirements annotation, LLM-assisted test generation, and continuous empirical feedback loops tailored to organizational capacity. The framework is evaluated against documented industrial challenges and research gaps, yielding a prioritized research agenda and a set of operational recommendations for practitioners and researchers. Concluding remarks: to achieve resilient, efficient, and trustworthy testing in contemporary environments, coordinated advances in tooling, human-centered process redesign, and empirical evaluation are necessary (Putra et al., 2023; Zhao et al., 2024).
Keywords
Software testing strategies, large language models, serverless testing, test automation
References
Putra, S.J.; Sugiarti, Y.; Prayoga, B.Y.; Samudera, D.W.; Khairani, D. Analysis of Strengths and Weaknesses of Software Testing Strategies: Systematic Literature Review. In Proceedings of the 2023 11th International Conference on Cyber and IT Service Management (CITSM), Makassar, Indonesia, 10–11 November 2023; pp. 1–5.
Gurcan, F.; Dalveren, G.G.M.; Cagiltay, N.E.; Roman, D.; Soylu, A. Evolution of Software Testing Strategies and Trends: Semantic Content Analysis of Software Research Corpus of the Last 40 Years. IEEE Access 2022, 10, 106093–106109.
Pudlitz, F.; Brokhausen, F.; Vogelsang, A. What Am I Testing and Where? Comparing Testing Procedures Based on Lightweight Requirements Annotations. Empirical Software Engineering 2020, 25, 2809–2843.
Kassab, M.; Laplante, P.; Defranco, J.; Neto, V.V.G.; Destefanis, G. Exploring the Profiles of Software Testing Jobs in the United States. IEEE Access 2021, 9, 68905–68916.
De Silva, D.; Hewawasam, L. The Impact of Software Testing on Serverless Applications. IEEE Access 2024, 12, 51086–51099.
Alshahwan, N.; Harman, M.; Marginean, A. Software Testing Research Challenges: An Industrial Perspective. In Proceedings of the 2023 IEEE Conference on Software Testing, Verification and Validation (ICST), Dublin, Ireland, 16–20 April 2023; pp. 1–10.
Aniche, M.; Treude, C.; Zaidman, A. How Developers Engineer Test Cases: An Observational Study. IEEE Transactions on Software Engineering 2021, 48, 4925–4946.
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2024, arXiv:2303.18223.
Wang, J.; Huang, Y.; Chen, C.; Liu, Z.; Wang, S.; Wang, Q. Software Testing With Large Language Models: Survey, Landscape, and Vision. IEEE Transactions on Software Engineering 2024, 50, 911–936.
Chen, L.; Guo, Q.; Jia, H.; Zeng, Z.; Wang, X.; Xu, Y.; Wu, J.; Wang, Y.; Gao, Q.; Wang, J.; et al. A Survey on Evaluating Large Language Models in Code Generation Tasks. arXiv 2024, arXiv:2408.16498.
Ricca, F.; Tonella, P. Analysis and testing of web applications. Proceedings of the International Conference on Software Engineering, 2001, 25(3), 25–34.
Andrews, A.; Offutt, J.; Alexander, R. Test generation for web applications. IEEE Transactions on Software Engineering 2005, 31(3), 187–202.
Smith, J.; Taylor, R. Automated frameworks for dynamic web testing. Software Testing Journal 2022, 37(1), 45–67.
Lee, K.; Johnson, S. Leveraging generative AI for automated test case creation. Proceedings of ICSE, 2022, pp. 198–207.
OpenAI. GPT-4 Technical Report. arXiv preprint, 2023.
AutoGPT. AutoGPT, 2022.
Qin, Y.; Liang, S.; Ye, Y.; Zhu, K.; Yan, L.; Lu, Y.; Lin, Y.; Cong, X.; Tang, X.; Qian, B.; et al. ToolLLM: Facilitating large language models to master 16,000+ real-world APIs. arXiv preprint, 2023.
Chandra, R.; Lulla, K.; Sirigiri, K. Automation frameworks for end-to-end testing of large language models (LLMs). Journal of Information Systems Engineering and Management 2025, 10, e464–e472.
Download and View Statistics
Copyright License
Copyright (c) 2025 Sanjay R. Kapoor

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.

