Implementing Site Reliability Engineering (SRE) in Legacy Retail Infrastructure
Hari Dasari , Expert Infrastructure Engineer Leading Financial Tech Company Aldie, VirginiaAbstract
During digital transformation, retail companies with legacy IT infrastructures struggle to maintain service dependability, scalability, and agility. Many mainframes, on-premise applications, batch processing processes, and monolithic codebases were not designed for today's dynamic operational contexts. Google-developed Site Reliability Engineering (SRE) approaches including Service Level Objectives (SLOs), automation, and blameless postmortems can bridge the gap between outdated systems and modern operational excellence. This article proposes gradual adoption, cultural change, and measurable service reliability improvements for legacy retail environments adopting SRE. A concentrated SRE rollout helped a national retail chain reduce toil and improve mean time to detect (MTTD), mean time to resolve (MTTR), and MTTR. The model shows that incremental SRE adoption can modernize legacy systems and prepare them for future innovation without comprehensive re-architecture.
Keywords
Site Reliability Engineering (SRE), Legacy Systems, Retail IT Infrastructure, Observability
References
Beyer, B., Jones, C., Petoff, J., & Murphy, N. R. (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media.
Krief, M. (2019). Learning DevOps: Continuously Deliver Better Software. Packt Publishing.
Kim, G., Humble, J., Debois, P., & Willis, J. (2016). The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations. IT Revolution Press.
Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). Borg, Omega, and Kubernetes. Communications of the ACM, 59(5), 50–57. https://doi.org/10.1145/2890784
OpenSLO. (2021). Open Specification for SLOs. https://openslo.com
Thongmak, M. (2022). Applying AI in IT Operations: Anomaly Detection and Incident Prediction in Legacy Systems. Journal of Information Technology Management, 33(1), 35–42.
Allspaw, J. (2017). Blameless PostMortems and a Just Culture: A Guide to Incident Investigation. Etsy Engineering. https://codeascraft.com
Gartner. (2023). Predicts 2023: Legacy Systems Modernization Strategies for CIOs. Gartner Research.
Woodcock, S. (2020). Automating Legacy Systems: Practices and Pitfalls. IEEE Software, 37(4), 67–73. https://doi.org/10.1109/MS.2020.2996582
Article Statistics
Copyright License
Copyright (c) 2025 Hari Dasari

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.