SLI
A Service Level Indicator (SLI) is a measurement of how well a service is performing on a given dimension, such as availability, latency, throughput, or correctness. SLIs are the raw observables; SLOs are targets set on top of them.
Common SLI types
- Availability: ratio of successful requests to total requests over a window.
- Latency: proportion of requests served below a threshold (for example, p95 under 200 ms).
- Throughput: events processed per second; often used for batch and stream pipelines.
- Quality / correctness: proportion of responses that meet a freshness, completeness, or accuracy check.
- Durability: for storage systems, the probability that data is not lost.
Designing a good SLI
- Measured at a point that reflects the user experience, typically the request boundary closest to the user.
- Defined as a ratio with a clear numerator and denominator, so the SLO becomes a single percentage.
- Resilient to harmless fluctuations: a single 5xx among 1 million requests should not invalidate the indicator.
🔗