← All work/№ 13·2025·CMS Desk

Custom LLM Observability

Internal Platform · Built from Scratch

PRIVATE REPO · NDAAI Infrastructure

Custom Prometheus + Grafana observability for every LLM workflow — cost per provider per day, P50/P95/P99 latency, error rates, fallback triggers, token usage per chain per model.

№ 01Prometheus + Grafana

№ 0240+ containers monitored

№ 03P50 / P95 / P99

№ 04Self-hosted

Visual proof

3 images · 0 videos

See Custom LLM Observability in action.

Production dashboard — requests + latency, cost-by-provider, tokens by model, workflow health, fallback triggers, live trace stream

Cost & reliability view — per-run cost, SLO met rate, cost trend by provider, latency vs cost, AI insights and recommendations

Stack

Built with

PrometheusGrafanaFastAPIPostgreSQLDockerHetznerCoolify

← Previous

Manav Solar Solutions

Discuss a similar build

OpenClaw Automation