Metrics
Metrics are numerical measurements aggregated over time, used for dashboards, alerting, and capacity planning. Where logs answer "what happened to this particular request", metrics answer "how is the system behaving in aggregate".
Metric types
- Counter. Monotonically increasing value: requests served, errors, bytes sent.
- Gauge. A value that can go up or down: queue depth, memory in use, connections open.
- Histogram. Distributions, used for latency. Reported as buckets that allow percentile computation.
- Summary. Like histogram but with pre-computed percentiles at the source.
Labels and cardinality
Each metric carries labels (also called dimensions or tags): route, status_code, region. Each unique combination of labels produces a separate time series. High-cardinality labels (per-user IDs, per-request IDs) explode storage and cost; the standard advice is to reserve high cardinality for traces and keep metric labels bounded.
The RED and USE methods
- RED (per service): Rate, Errors, Duration. The minimum dashboard for a request-driven service.
- USE (per resource): Utilisation, Saturation, Errors. The minimum for hardware and infrastructure.
Common tools
- Open source: Prometheus, Mimir, VictoriaMetrics, OpenTelemetry Collector
- Commercial: Datadog, New Relic, Grafana Cloud, Honeycomb, Dynatrace
🔗
📖
Further Reading
What is API Observability? Logs, Metrics, Traces Explained
What is API Observability? Logs, Metrics, Traces Explained