Monitoring with Grafana

The monitoring system is made up of three components: software for collecting, storing and displaying data.

Collectd

Collectd is a daemon, installed on the monitored machine, which periodically collects system and application performance metrics. The collected data are sent to the database by the Write Graphite plugin.

Basic collectd plugins from the full plugin list:

Plugin Description
Battery Collects the battery’s charge, the drawn current and the battery’s voltage
CPU Collects the amount of time spent by the CPU in various states
DF Collects file system usage information
Exec Executes scripts / applications and reads values back (this plugin is less efficient than Python, Java or Perl plugins)
Interface Collects information about the traffic, packets per second and errors of interfaces
Java Embeds a Java virtual machine (JVM) into collectd and exposes the application programming interface (API) to Java programs
Load Collects the system load
Memory Collects physical memory utilization
Ping Measures network latency
PostgreSQL Executes SQL statements on a PostgreSQL database. It then reads back the results
Processes Collects the number of processes, grouped by their state
Python Embeds a Python interpreter into collectd and exposes the application programming interface (API) to Python-scripts
Tail Reads new lines of logs and counts lines which match regex pattern
thermal Reads system temperature information
Uptime Keeps track of the system uptime
Write Graphite Stores values in Carbon, the storage layer of Graphite

Graphite

Graphite does two things:

  1. Store numeric time-series data
  2. Render graphs of this data on demand

Graphite has it‘s own web interface, that is useful to view all machines and collected metrics. Typically only a subset of collected metrics is used in Grafana.

Fixed storage size

Database tables have fixed size. When storage time interval elapses, Graphite starts to rewrite oldest data points.

Storage aggregation

Default data collection rate of collectd agents is 10 seconds. To be able to store data points for long time interval, the old datapoints are aggregated to lower-precision retentions.

Aggregation rules example

Storage time Data points frequency Number of data points
1 hour 10 s 360
6 hours 1 m 360
24 hours 5 m 288
30 days 15 m 2880

Graphite web interface

Grafana

Grafana allows you to query, visualize, alert on and understand your metrics.

Graph panel

The main panel in Grafana. It provides a very rich set of graphing options.

Graph panel

Singlestat panel

The Singlestat panel reduces the series into a single number. It also provides thresholds to color the stat.

Singlestat panel

Status panel

Very similar to the Singlestat panel, but it can hold multiple values from the same data source. Each value can be used to customize the panel in different ways:

  • Mark the severity of the component
  • Mark if the component is disabled
  • Show extra data in the panel about the component

Status panel

Alerting

Grafana can send email notifications if metrics exceed predefined conditions or return back to normal.

Alerting

Share