Articles | Open Access | DOI: https://doi.org/10.37547/tajet/Volume07Issue05-14

Ensuring the Availability of Critical Cloud Services Through SRE Practices

Alexandr Hacicheant , Software Engineer and Head of Reliability Engineering at Mayflower.

Abstract

The availability of cloud services is a critical factor in the success of digital products. Downtime in essential systems—whether for fintech platforms or major online retailers—can lead to substantial financial losses and reputational damage. Meanwhile, modern cloud infrastructures continue to grow in complexity. Distributed architectures, automated scaling, and frequent software releases all increase the risk of system failures.

In this dynamic environment, companies are actively seeking strategies to minimize incidents and mitigate their impact. One of the most effective approaches is Site Reliability Engineering (SRE)—a discipline pioneered at Google that combines engineering best practices with operational processes to enhance the reliability and resilience of cloud services.

This article examines how Site Reliability Engineering (SRE) principles address the challenges of maintaining cloud services availability. Alexandr Hacicheant, Head of Reliability Engineering at Mayflower, provides an analysis of key issues in this field, the core methodologies of SRE, and real-world applications that contribute to minimizing downtime.

Keywords

Site Reliability Engineering, SRE, Сloud services, Infrastructure, SLI, SLO

References

Impact of Site Reliability Engineering on Manufacturing Operations: Improving Efficiency and Reducing Downtime, IJSRP, 2020

https://www.arxiv.org/abs/2008.06717

Site Reliability Engineering (SRE), Google

https://sre.google/

SITE RELIABILITY ENGINEERING A MODERN APPROACH TO ENSURING CLOUD SERVICE UPTIME AND RELIABILITY, IJCET, 2024

https://www.researchgate.net/publication/378032569_SITE_RELIABILITY_ENGINEERING_A_MODERN_APPROACH_TO_ENSURING_CLOUD_SERVICE_UPTIME_AND_RELIABILITY

Evaluating the Impact of Site Reliability Engineering on Cloud Services Availability, WJAETS, 2020

https://www.researchgate.net/publication/386087642_Evaluating_the_Impact_of_Site_Reliability_Engineering_on_Cloud_Services_Availability

Using Cloud-Native and SRE Principles to Achieve Speed and Resiliency, IBM

https://www.ibm.com/think/insights/using-cloud-native-and-sre-principles-to-achieve-speed-and-resiliency

Article Statistics

Copyright License

Download Citations

How to Cite

Alexandr Hacicheant. (2025). Ensuring the Availability of Critical Cloud Services Through SRE Practices. The American Journal of Engineering and Technology, 7(05), 154–158. https://doi.org/10.37547/tajet/Volume07Issue05-14