Saturday, March 11, 2023

SLA, SLO, and SLI

We are doing service maturity audit now, and more often than not the team gets confused about these three acronyms, so I want to quickly review here.

An SLA (service level agreement) is an agreement between service provider and customers about measurable metrics like uptime, performance, QoS, responsiveness, and responsibilities. It’s always better to under-promise and over-deliver the agreement.

An SLO (service level objective) is an agreement within an SLA about a specific metric like uptime or response time. Commit to as few SLOs as possible and focus on the ones that matter most to customers.

An SLI (service level indicator) measures compliance with an SLO (service level objective). To stay in compliance with your SLA, the SLI will need to meet or exceed the promises made in that document. Choose which metrics actually matter to your core SLOs and put your energy into tracking those critical metrics effectively.

For service reliability and availability, build disaster recovery plan and an error budget is very important. Leaving room for failures not only protects the business from SLA violations, but also has the space to try innovative new solutions that might fail. However, production is production, every second matters when service is down. We need to design for failure and plan for failure in service development and operations.

No comments:

Post a Comment