Prometheus
Prometheus is an open-source time-series database and monitoring system that scrapes metrics from instrumented applications over HTTP, stores them locally, and exposes a powerful query language (PromQL) for dashboards and alerts. It is the de facto standard for cloud-native infrastructure monitoring.
How it works
Applications expose a /metrics endpoint in the Prometheus exposition format. A Prometheus server is configured with scrape targets; on each scrape interval (typically 15 to 60 seconds), it pulls the metrics, applies relabelling, and stores them in its local TSDB. Service discovery (Kubernetes, Consul, EC2, file_sd) keeps the target list current.
PromQL
PromQL is the query language for selecting time series, applying functions, and computing aggregations. Common patterns:
rate(http_requests_total[5m])- requests per second over the last 5 minuteshistogram_quantile(0.99, rate(latency_bucket[5m]))- p99 latencysum by (route) (rate(http_requests_total[1m]))- per-route request rate
Ecosystem
- Alertmanager. Receives alerts from Prometheus, deduplicates, groups, routes to Slack, PagerDuty, email.
- Grafana. The standard visualisation layer on top of Prometheus.
- Exporters. Adapters that expose metrics from third-party systems (node_exporter, blackbox_exporter, mysqld_exporter).
- Long-term storage. Thanos, Cortex, Mimir, VictoriaMetrics for horizontal scaling and multi-cluster.
What is API Observability? Logs, Metrics, Traces Explained