A self-hosted observability stack built on Grafana and Prometheus, giving real-time visibility into uptime, resource consumption, and network health across every VM in the dnaie.com homelab infrastructure.
Why Build It
When you host client VMs alongside personal projects, guesswork is not an option. A disk filling up at 3 a.m. or a memory leak creeping through a container can cascade into downtime that erodes trust. This dashboard replaced manual SSH spot-checks with continuous, automated telemetry.
Stack Breakdown
Data Collection
Prometheus scrapes Node Exporter endpoints on each VM every fifteen seconds. Docker containers expose custom metrics through cAdvisor, and Nginx reverse-proxy logs feed into Loki for structured query.
Visualization
Grafana renders five primary panels — CPU load, memory pressure, disk I/O, network throughput, and container health — each with configurable alert thresholds that push notifications to Discord.
Results
Since deployment, the dashboard has caught two near-miss disk saturation events and one runaway Docker container before they impacted hosted services. Average incident detection dropped from hours to under ninety seconds, and the historical data now informs capacity planning for new client VM allocations.
You cannot fix what you cannot see. This dashboard made the invisible visible.

