N
← All work/13·2025·CMS Desk

Custom LLM Observability

Internal Platform · Built from Scratch

PRIVATE REPO · NDAAI Infrastructure

Custom Prometheus + Grafana observability for every LLM workflow — cost per provider per day, P50/P95/P99 latency, error rates, fallback triggers, token usage per chain per model.

№ 01Prometheus + Grafana
№ 0240+ containers monitored
№ 03P50 / P95 / P99
№ 04Self-hosted
Visual proof
3 images · 0 videos

See Custom LLM Observability in action.

Production dashboard — requests + latency, cost-by-provider, tokens by model, workflow health, fallback triggers, live trace stream

Cost & reliability view — per-run cost, SLO met rate, cost trend by provider, latency vs cost, AI insights and recommendations

Trace Explorer — workflow span timeline, span tree, live events stream, trace metadata for any request across the stack

Stack

Built with

PrometheusGrafanaFastAPIPostgreSQLDockerHetznerCoolify