Articles | Open Access | DOI: https://doi.org/10.37547/tajet/Volume07Issue07-16

Implementing Site Reliability Engineering (SRE) in Legacy Retail Infrastructure

Hari Dasari , Expert Infrastructure Engineer Leading Financial Tech Company Aldie, Virginia

Abstract

During digital transformation, retail companies with legacy IT infrastructures struggle to maintain service dependability, scalability, and agility. Many mainframes, on-premise applications, batch processing processes, and monolithic codebases were not designed for today's dynamic operational contexts. Google-developed Site Reliability Engineering (SRE) approaches including Service Level Objectives (SLOs), automation, and blameless postmortems can bridge the gap between outdated systems and modern operational excellence. This article proposes gradual adoption, cultural change, and measurable service reliability improvements for legacy retail environments adopting SRE. A concentrated SRE rollout helped a national retail chain reduce toil and improve mean time to detect (MTTD), mean time to resolve (MTTR), and MTTR. The model shows that incremental SRE adoption can modernize legacy systems and prepare them for future innovation without comprehensive re-architecture.

Keywords

Site Reliability Engineering (SRE), Legacy Systems, Retail IT Infrastructure, Observability

References

Beyer, B., Jones, C., Petoff, J., & Murphy, N. R. (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media.

Krief, M. (2019). Learning DevOps: Continuously Deliver Better Software. Packt Publishing.

Kim, G., Humble, J., Debois, P., & Willis, J. (2016). The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations. IT Revolution Press.

Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). Borg, Omega, and Kubernetes. Communications of the ACM, 59(5), 50–57. https://doi.org/10.1145/2890784

OpenSLO. (2021). Open Specification for SLOs. https://openslo.com

Thongmak, M. (2022). Applying AI in IT Operations: Anomaly Detection and Incident Prediction in Legacy Systems. Journal of Information Technology Management, 33(1), 35–42.

Allspaw, J. (2017). Blameless PostMortems and a Just Culture: A Guide to Incident Investigation. Etsy Engineering. https://codeascraft.com

Gartner. (2023). Predicts 2023: Legacy Systems Modernization Strategies for CIOs. Gartner Research.

Woodcock, S. (2020). Automating Legacy Systems: Practices and Pitfalls. IEEE Software, 37(4), 67–73. https://doi.org/10.1109/MS.2020.2996582

Article Statistics

Copyright License

Download Citations

How to Cite

Hari Dasari. (2025). Implementing Site Reliability Engineering (SRE) in Legacy Retail Infrastructure. The American Journal of Engineering and Technology, 7(07), 169–179. https://doi.org/10.37547/tajet/Volume07Issue07-16