:INFO Dashboards Are Outputs, Scrape Configs Are the Work Most monitoring guides start with a Grafana screenshot and skip straight to dashboard import IDs. This one starts with the Prometheus scrape model, because that is where the data comes from and where things actually go wrong. Prometheus is a pull-based time-series database: it polls configured targets on a schedule, collects numeric metrics exposed at a /metrics HTTP endpoint, and stores them with timestamps. Grafana queries Prometheus and renders the results. The work is in knowing what to instrument and writing scrape configs that collect the right data. :COUNTER.half 15 Second Scrape Interval :PATH Set Up Your Exporters Exporters are small programs that translate system or application state into Prometheus-format metrics at /metrics. Install three for a complete home lab picture. node_exporter exposes CPU, memory, disk, and network metrics for the host machine — run it a :PATH Write a docker-compose.yml for the Full Stack Create a monitoring directory: mkdir -p ~/docker/monitoring. The compose file defines four services: prometheus, grafana, node_exporter, and cadvisor. Prometheus mounts a named volume for its time-series data directory and a bind-mounted prometheus.yml co :PATH Configure Prometheus Scrape Jobs In prometheus.yml, the scrape_configs section lists your targets. Each job has a name and a list of static targets — the host:port of each exporter. Add jobs for node (targeting node_exporter:9100), cadvisor (targeting cadvisor:8080), and blackbox (target :PATH Write Basic PromQL Queries PromQL is Prometheus's query language. The simplest query is a metric name: up returns 1 for each scrape target currently responding and 0 for each one that is not. To see CPU usage, use rate(node_cpu_seconds_total{mode!="idle"}[5m]) to calculate the per- :PATH Import a Grafana Dashboard and Set Up Alerting In Grafana, go to Dashboards and choose Import. Enter dashboard ID 1860 to import the popular Node Exporter Full dashboard, which provides a comprehensive view of your Pi's resources without writing any queries. Point the dashboard's data source at your P :NOTE Prometheus stores all metric data on disk in its data directory. By default there is no retention limit, and a Pi with many exporters can accumulate gigabytes of data over months. Set the storage.tsdb.retention.time flag (such as 90d for 90 days) and storage.tsdb.retention.size (such as 10GB) in Prometheus's command arguments to prevent the time-series database from filling your drive. :INFO What Comes Next Metrics tell you what your applications are doing right now — CPU is high, memory is growing, a container went down at 3:17 AM. But when something goes wrong and you need to understand what happened leading up to it, you need logs. Centralizing logs from every container and system service into one queryable store is what makes post-incident investigation possible. :SLATE 989 :LINK https://prometheus.io/docs/introduction/overview/ Prometheus documentation