Grafana Lab with Graphite Datasource metrics – Exploring

Uncategorized

Below is Step 1 only: a deep, student-ready lab guide for Exploring Graphite metrics in Grafana 13.x using Explore.

This assumes:

Graphite datasource is already added in Grafana
Graphite URL is working
Telegraf is sending Linux metrics to Graphite
Metrics prefix is telegraf
Hostname is linux-demo

Grafana Explore is designed for ad-hoc investigation before creating dashboards. Grafana describes Explore as the place to query, analyze, and aggregate data without first creating a dashboard. (Grafana Labs)


Lab 1: Explore Linux Metrics from Graphite in Grafana Explore

Objective

By the end of this lab, students will be able to:

1. Open Grafana Explore
2. Select the Graphite datasource
3. Browse available Graphite metrics
4. Query CPU, memory, disk, network, and load metrics
5. Use Graphite functions in Grafana Explore
6. Compare multiple metrics
7. Understand Graphite metric naming
8. Troubleshoot common Explore/query issues

Lab Architecture

Your current data flow is:

Linux Host
   ↓
Telegraf
   ↓
Graphite Carbon :2003
   ↓
Whisper Storage
   ↓
Graphite Web :8080
   ↓
Grafana Graphite Datasource
   ↓
Grafana Explore

In this setup, Telegraf collects Linux metrics every 10 seconds and sends them to Graphite. Grafana then reads those metrics from Graphite.


Important Concepts Before Starting

What is Grafana Explore?

Explore is Grafana’s investigation workspace. It is used to test queries, inspect metrics, compare values, and experiment before building dashboards or alerts. Grafana’s documentation says Explore lets you query and analyze data without creating a dashboard first. (Grafana Labs)

In simple words:

Dashboard = saved view for long-term monitoring
Explore = temporary workspace for investigation and learning

What is Graphite Query Format?

Graphite metrics are usually stored in a dot-separated path format:

telegraf.linux-demo.cpu.cpu-total.usage_active

Breakdown:

PartMeaning
telegrafMetric prefix configured in Telegraf
linux-demoHostname configured in Telegraf
cpuMeasurement type
cpu-totalCPU instance
usage_activeActual metric field

Grafana includes a Graphite-specific query editor that helps users navigate metric paths, add functions, and build Graphite queries. (Grafana Labs)


Pre-Lab Validation

Before students open Grafana Explore, validate that Graphite has metrics.

Run this on the Graphite server:

docker exec -it graphite find /opt/graphite/storage/whisper/telegraf -type f | head -20

Expected output should look similar to:

/opt/graphite/storage/whisper/telegraf/linux-demo/cpu/cpu-total/usage_active.wsp
/opt/graphite/storage/whisper/telegraf/linux-demo/mem/used_percent.wsp
/opt/graphite/storage/whisper/telegraf/linux-demo/system/load1.wsp

If .wsp files are visible, Graphite is storing metrics.

Also test Graphite API:

curl "http://localhost:8080/render?target=telegraf.linux-demo.cpu.cpu-total.usage_active&from=-10min&format=json"

Expected result:

JSON output with datapoints

If this works, Grafana should also be able to query Graphite.


Step 1.1: Open Grafana Explore

Login to Grafana.

From the left-side menu, click:

Explore

Depending on your Grafana 13.x UI, this may appear as:

Explore

or under the navigation/search menu.

You should see a query workspace with a datasource selector at the top.


Step 1.2: Select Graphite Datasource

At the top of Explore, select your Graphite datasource.

Example datasource name:

Graphite

or whatever name you used when adding it.

Expected result:

Grafana Explore opens the Graphite query editor.

If you see another datasource such as Prometheus, Loki, MySQL, or Elasticsearch, change it to Graphite.


Step 1.3: Set Time Range

At the top-right corner, set the time range to:

Last 15 minutes

Why?

Telegraf is sending fresh Linux metrics every 10 seconds. A short time range makes it easier to confirm live data.

Recommended time ranges for this lab:

Time RangeUse Case
Last 5 minutesLive verification
Last 15 minutesBest for initial lab
Last 1 hourBetter trend visibility
Last 6 hoursLonger infrastructure trend

For first-time testing, use:

Last 15 minutes

Step 1.4: Run First CPU Query

In the Graphite query field, enter:

telegraf.linux-demo.cpu.cpu-total.usage_active

Click:

Run query

or press:

Shift + Enter

Expected result:

A time-series graph showing CPU active usage percentage.

This metric shows active CPU usage.

Example meaning:

If value is 5, CPU active usage is around 5%.
If value is 80, CPU active usage is around 80%.

Step 1.5: Understand CPU Metrics

Try these CPU queries one by one.

Total CPU Active Usage

telegraf.linux-demo.cpu.cpu-total.usage_active

Meaning:

Overall active CPU usage across the server.

CPU Idle Percentage

telegraf.linux-demo.cpu.cpu-total.usage_idle

Meaning:

Percentage of time CPU is idle.

CPU User Usage

telegraf.linux-demo.cpu.cpu-total.usage_user

Meaning:

CPU used by user-space processes.

CPU System Usage

telegraf.linux-demo.cpu.cpu-total.usage_system

Meaning:

CPU used by kernel/system processes.

CPU I/O Wait

telegraf.linux-demo.cpu.cpu-total.usage_iowait

Meaning:

CPU waiting for disk I/O operations.

Step 1.6: Compare CPU Active and CPU Idle

In Explore, add two queries.

Query A:

telegraf.linux-demo.cpu.cpu-total.usage_active

Query B:

telegraf.linux-demo.cpu.cpu-total.usage_idle

Expected result:

Two lines appear in the graph.

Observation:

When usage_active increases, usage_idle normally decreases.

Teaching point:

CPU active and CPU idle are opposite indicators.
High active CPU means the server is busy.
High idle CPU means the server has spare capacity.

Step 1.7: Generate CPU Load for Testing

On the Linux server, install stress:

apt update
apt install -y stress

Generate CPU load for 2 minutes:

stress --cpu 2 --timeout 120

Now return to Grafana Explore.

Run:

telegraf.linux-demo.cpu.cpu-total.usage_active

Set time range:

Last 5 minutes

Expected result:

CPU usage should increase during the stress test.

This confirms the full monitoring chain is working:

Linux CPU activity → Telegraf → Graphite → Grafana Explore

Step 1.8: Explore Memory Metrics

Now query memory usage.

Memory Used Percentage

telegraf.linux-demo.mem.used_percent

Expected result:

Graph shows percentage of memory used.

Memory Available

telegraf.linux-demo.mem.available

Expected result:

Graph shows available memory in bytes.

Memory Used

telegraf.linux-demo.mem.used

Expected result:

Graph shows used memory in bytes.

Memory Free

telegraf.linux-demo.mem.free

Expected result:

Graph shows free memory in bytes.

Teaching point:

used_percent is easier for dashboards and alerts.
used, available, and free are raw byte values.

For student learning, use this as primary memory metric:

telegraf.linux-demo.mem.used_percent

Step 1.9: Explore System Load Metrics

Linux load average shows how many processes are waiting or running.

Query:

telegraf.linux-demo.system.load1

Other useful queries:

telegraf.linux-demo.system.load5
telegraf.linux-demo.system.load15

Meaning:

MetricMeaning
load11-minute load average
load55-minute load average
load1515-minute load average

Teaching explanation:

Load average helps identify whether the system is under sustained pressure.
A short spike in load1 may be normal.
If load1, load5, and load15 are all high, the system may be consistently overloaded.

Step 1.10: Compare Load Average Metrics

Add three queries in Explore.

Query A:

telegraf.linux-demo.system.load1

Query B:

telegraf.linux-demo.system.load5

Query C:

telegraf.linux-demo.system.load15

Expected result:

Three lines appear in the same graph.

Interpretation:

PatternMeaning
load1 high, load5/load15 lowRecent spike
All three highSustained load
load15 high, load1 lowLoad was high earlier but is reducing

Step 1.11: Explore Disk Metrics

Use disk used percentage:

telegraf.linux-demo.disk.*.used_percent

Expected result:

One or more disk mount series appear.

The * wildcard matches available disk paths or devices.

You can also try:

telegraf.linux-demo.disk.*.free
telegraf.linux-demo.disk.*.used
telegraf.linux-demo.disk.*.total

Teaching point:

Disk percentage is better than raw bytes for quick operational monitoring.

Recommended dashboard metric later:

telegraf.linux-demo.disk.*.used_percent

Step 1.12: Explore Network Metrics

Use network received bytes:

telegraf.linux-demo.net.*.bytes_recv

Use network sent bytes:

telegraf.linux-demo.net.*.bytes_sent

Expected result:

Network interfaces appear as separate series.

Common interfaces may include:

eth0
lo
ens5
docker0

Teaching point:

lo is loopback traffic.
eth0, ens5, or similar is usually the actual network interface.
docker0 may appear if Docker networking exists.

For useful server network monitoring, focus on real interfaces, not loopback.

Example:

telegraf.linux-demo.net.eth0.bytes_recv

or on AWS EC2, sometimes:

telegraf.linux-demo.net.ens5.bytes_recv

Step 1.13: Explore Process Metrics

Query:

telegraf.linux-demo.processes.total

Other process metrics:

telegraf.linux-demo.processes.running
telegraf.linux-demo.processes.sleeping
telegraf.linux-demo.processes.blocked
telegraf.linux-demo.processes.zombies

Teaching point:

Process metrics help detect process growth, stuck processes, and zombie processes.

Good operational metric:

telegraf.linux-demo.processes.zombies

Expected normal value:

0

Step 1.14: Use Wildcards in Graphite Queries

Graphite supports wildcard-style metric selection.

Example:

telegraf.linux-demo.cpu.cpu-total.*

This returns multiple CPU fields, such as:

usage_active
usage_idle
usage_user
usage_system
usage_iowait

Another example:

telegraf.linux-demo.system.*

This returns system metrics such as:

load1
load5
load15
n_users
uptime

Another example:

telegraf.linux-demo.mem.*

This returns memory metrics.

Teaching point:

Wildcards are useful for discovery, but too many series can make graphs noisy.
For dashboards and alerts, use specific metrics.

Step 1.15: Use Graphite Functions in Explore

Graphite functions transform, combine, and calculate values from time-series data. Graphite’s official documentation describes functions as tools to transform, combine, and perform computations on series data. (Graphite Documentation)

Grafana’s Graphite query editor supports adding functions directly in the query editor. (Grafana Labs)


Function 1: alias()

Use alias() to rename a metric in the graph legend.

Query:

alias(telegraf.linux-demo.cpu.cpu-total.usage_active, 'CPU Active %')

Expected result:

Graph legend shows CPU Active % instead of the long metric path.

Why useful:

Long Graphite metric names are hard to read.
alias() makes Explore and dashboards easier to understand.

Function 2: movingAverage()

Use movingAverage() to smooth noisy data.

Query:

movingAverage(telegraf.linux-demo.cpu.cpu-total.usage_active, 5)

Expected result:

CPU graph becomes smoother.

Teaching point:

movingAverage is useful when raw metrics are too spiky.
But for alerts, be careful because smoothing can hide short incidents.

Function 3: summarize()

Use summarize() to aggregate data into larger time buckets.

Query:

summarize(telegraf.linux-demo.cpu.cpu-total.usage_active, '1min', 'avg')

Expected result:

CPU usage is shown as 1-minute average values.

Teaching point:

summarize helps convert high-resolution data into cleaner trend data.

Function 4: highestCurrent()

Use highestCurrent() to find the highest current value among multiple series.

Example for disk:

highestCurrent(telegraf.linux-demo.disk.*.used_percent, 5)

Expected result:

Grafana shows the top 5 disk mount points by current used percentage.

Teaching point:

This is helpful when many disks or mount points exist.

Function 5: averageSeries()

Use averageSeries() to average multiple matching series.

Example:

averageSeries(telegraf.linux-demo.cpu.cpu*.usage_active)

Expected result:

Average CPU active usage across CPU cores.

Teaching point:

averageSeries is useful when there are multiple CPU cores or multiple similar series.

Step 1.16: Use Query Builder vs Raw Query

Grafana Graphite datasource usually provides two ways to build queries:

1. Visual/query builder style
2. Raw Graphite query text

For students, teach both.

Query Builder Approach

Use the metric browser to select:

telegraf
 → linux-demo
   → cpu
     → cpu-total
       → usage_active

Then run the query.

Raw Query Approach

Directly type:

telegraf.linux-demo.cpu.cpu-total.usage_active

Teaching recommendation:

Beginners should start with metric browsing.
Intermediate users should learn raw Graphite query syntax.

Step 1.17: Explore Multiple Metrics Together

Now create an investigation view.

Add these queries:

Query A:

alias(telegraf.linux-demo.cpu.cpu-total.usage_active, 'CPU Active %')

Query B:

alias(telegraf.linux-demo.mem.used_percent, 'Memory Used %')

Query C:

alias(telegraf.linux-demo.system.load1, 'Load 1m')

Expected result:

CPU, memory, and load appear together.

Important warning:

CPU and memory are percentages.
Load average is not a percentage.
So this combined graph is useful for quick investigation, but not always perfect for dashboard presentation.

Teaching point:

Explore is good for correlation.
Dashboards should be cleaner and better structured.

Step 1.18: Use Split View in Explore

Grafana Explore supports side-by-side investigation.

In Explore, use:

Split

or:

Split view

depending on the UI.

Left side:

telegraf.linux-demo.cpu.cpu-total.usage_active

Right side:

telegraf.linux-demo.mem.used_percent

Expected result:

CPU and memory can be investigated side by side.

Use case:

When CPU is high, check whether memory also increased.
When load is high, check whether disk or network also changed.

Step 1.19: Inspect Query Results

In Explore, after running a query, use the query result/inspect options if available.

Look for:

Data
Stats
Query
JSON

The exact UI may vary slightly depending on Grafana 13.x build and permissions.

Students should understand:

Graph view shows trend.
Data/table view shows actual datapoints.
Query inspector helps troubleshoot query and response.

This is especially useful when:

Graph is empty
Metric path is wrong
Datasource is not responding
Time range is incorrect

Step 1.20: Recommended Metrics for Students to Explore

Use this table as the main lab reference.

AreaGraphite QueryMeaning
CPUtelegraf.linux-demo.cpu.cpu-total.usage_activeActive CPU usage %
CPUtelegraf.linux-demo.cpu.cpu-total.usage_idleIdle CPU %
CPUtelegraf.linux-demo.cpu.cpu-total.usage_iowaitCPU waiting on disk I/O
Memorytelegraf.linux-demo.mem.used_percentMemory used %
Memorytelegraf.linux-demo.mem.availableAvailable memory bytes
Loadtelegraf.linux-demo.system.load11-minute load average
Loadtelegraf.linux-demo.system.load55-minute load average
Loadtelegraf.linux-demo.system.load1515-minute load average
Disktelegraf.linux-demo.disk.*.used_percentDisk usage %
Networktelegraf.linux-demo.net.*.bytes_recvNetwork received bytes
Networktelegraf.linux-demo.net.*.bytes_sentNetwork sent bytes
Processestelegraf.linux-demo.processes.totalTotal process count
Processestelegraf.linux-demo.processes.zombiesZombie processes

Step 1.21: Recommended Explore Exercises

Exercise 1: Find CPU Usage

Task:

Find current CPU active usage.

Query:

telegraf.linux-demo.cpu.cpu-total.usage_active

Expected learning:

Students understand CPU percentage metric.

Exercise 2: Compare CPU Active and Idle

Queries:

telegraf.linux-demo.cpu.cpu-total.usage_active
telegraf.linux-demo.cpu.cpu-total.usage_idle

Expected learning:

Students understand relationship between CPU busy and idle time.

Exercise 3: Find Memory Usage

Query:

telegraf.linux-demo.mem.used_percent

Expected learning:

Students understand memory utilization percentage.

Exercise 4: Find System Load Trend

Queries:

telegraf.linux-demo.system.load1
telegraf.linux-demo.system.load5
telegraf.linux-demo.system.load15

Expected learning:

Students understand short-term vs long-term load.

Exercise 5: Find Disk Usage

Query:

telegraf.linux-demo.disk.*.used_percent

Expected learning:

Students understand wildcard-based disk exploration.

Exercise 6: Find Network Traffic

Queries:

telegraf.linux-demo.net.*.bytes_recv
telegraf.linux-demo.net.*.bytes_sent

Expected learning:

Students understand network receive/send metrics.

Exercise 7: Use alias()

Query:

alias(telegraf.linux-demo.mem.used_percent, 'Memory Used %')

Expected learning:

Students understand how to make legends readable.

Exercise 8: Use movingAverage()

Query:

movingAverage(telegraf.linux-demo.cpu.cpu-total.usage_active, 5)

Expected learning:

Students understand smoothing.

Step 1.22: Student Troubleshooting Guide

Problem 1: No data in Explore

Check time range first.

Use:

Last 15 minutes

Then check Graphite files:

docker exec -it graphite find /opt/graphite/storage/whisper/telegraf -type f | head

If no files exist, Telegraf is not sending metrics.


Problem 2: Query path not found

Try wildcard discovery:

telegraf.*

Then:

telegraf.linux-demo.*

Then:

telegraf.linux-demo.cpu.*

This helps discover the actual metric path.


Problem 3: Wrong hostname

Your hostname is configured as:

linux-demo

But if students changed it, metric path may be different.

Check actual metric directories:

docker exec -it graphite find /opt/graphite/storage/whisper/telegraf -maxdepth 2 -type d

Example output:

/opt/graphite/storage/whisper/telegraf/linux-demo

If hostname is different, use that name in Grafana queries.


Problem 4: Grafana datasource test works, but Explore has no data

Check Graphite directly:

curl "http://YOUR_GRAPHITE_SERVER:8080/render?target=telegraf.linux-demo.mem.used_percent&from=-10min&format=json"

If Graphite returns datapoints, Grafana query path or time range is wrong.

If Graphite returns empty datapoints, the metric may not exist or no data is available for the selected time.


Problem 5: Too many lines in graph

This happens with wildcard queries like:

telegraf.linux-demo.*

Fix by using a more specific query:

telegraf.linux-demo.cpu.cpu-total.usage_active

or use Graphite functions like:

highestCurrent(telegraf.linux-demo.disk.*.used_percent, 5)

Step 1.23: Lab Completion Checklist

Students should complete the following before moving to dashboard creation:

TaskCompleted
Opened Grafana Explore
Selected Graphite datasource
Queried CPU active usage
Queried memory used percentage
Queried load average
Queried disk usage using wildcard
Queried network receive/send metrics
Used alias() function
Used movingAverage() function
Used multiple queries together
Understood metric path structure
Verified data changes during stress test

Step 1.24: Final Student Summary

At the end of this lab, students should understand this clearly:

Grafana Explore is used for ad-hoc metric investigation.
Graphite stores metrics in dot-separated paths.
Telegraf is collecting Linux metrics and sending them to Graphite.
Grafana reads Graphite metrics using the Graphite datasource.
CPU, memory, disk, network, load, and process metrics can be explored directly.
Graphite functions such as alias(), movingAverage(), summarize(), and highestCurrent() help make queries more useful.
Explore is the best place to test queries before creating dashboards and alerts.