{"id":2240,"date":"2026-04-24T08:30:45","date_gmt":"2026-04-24T08:30:45","guid":{"rendered":"https:\/\/sreschool.com\/blog\/?p=2240"},"modified":"2026-05-05T07:27:33","modified_gmt":"2026-05-05T07:27:33","slug":"grafana-lab-with-graphite-datasource-metrics-alerts","status":"publish","type":"post","link":"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/","title":{"rendered":"Grafana Lab with Graphite Datasource metrics \u2013 Alerts"},"content":{"rendered":"\n<p>Below is Step 3 only: a deep, student-ready lab guide for creating Grafana 13.x alerts using Graphite + Telegraf Linux metrics.<\/p>\n\n\n\n<p>This guide assumes you already completed:<\/p>\n\n\n\n<p>Step 1: Explored Graphite metrics in Grafana Explore Step 2: Created Linux Monitoring Dashboard Datasource: Graphite Metrics prefix: telegraf Host: linux-demo Dashboard: Linux Monitoring &#8211; Graphite Telegraf<\/p>\n\n\n\n<p>Very Important Metric Validation<\/p>\n\n\n\n<p>Based on the actual metrics collected in your Graphite storage, this lab uses the following metric structure:<\/p>\n\n\n\n<p>telegraf.linux-demo..<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.usage_active telegraf.linux-demo.mem.used_percent telegraf.linux-demo.disk.used_percent telegraf.linux-demo.system.load1 telegraf.linux-demo.processes.zombies<\/p>\n\n\n\n<p>Do not use the older assumed paths below in this lab:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.cpu-total.usage_active telegraf.linux-demo.disk.*.used_percent telegraf.linux-demo.net.eth0.bytes_recv telegraf.linux-demo.net.ens5.bytes_recv<\/p>\n\n\n\n<p>Reason:<\/p>\n\n\n\n<p>Your current Telegraf Graphite output is storing metrics without CPU tag, disk mountpoint tag, and network interface tag in the Graphite path.<\/p>\n\n\n\n<p>Grafana alert rules are based on three major parts: a query that selects data, a condition\/threshold that decides when to fire, and evaluation settings that decide how often and how long the condition must be true. Grafana\u2019s alerting model uses query, reduce expression, threshold expression, evaluation interval, and pending period.<\/p>\n\n\n\n<p>Step 1<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-sre-school wp-block-embed-sre-school\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"EpheF2v9X2\"><a href=\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-exploring\/\">Grafana Lab with Graphite Datasource metrics &#8211; Exploring<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"&#8220;Grafana Lab with Graphite Datasource metrics &#8211; Exploring&#8221; &#8212; SRE School\" src=\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-exploring\/embed\/#?secret=eQmLRDDHkB#?secret=EpheF2v9X2\" data-secret=\"EpheF2v9X2\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>Step 2<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-sre-school wp-block-embed-sre-school\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"UhP3MgXmoD\"><a href=\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-dashboard\/\">Grafana Lab with Graphite Datasource metrics \u2013 Dashboard<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"&#8220;Grafana Lab with Graphite Datasource metrics \u2013 Dashboard&#8221; &#8212; SRE School\" src=\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-dashboard\/embed\/#?secret=yuUkGk40aH#?secret=UhP3MgXmoD\" data-secret=\"UhP3MgXmoD\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>Step 3: Create Alerts in Grafana 13.x for Graphite Linux Metrics<\/p>\n\n\n\n<p>Lab Objective<\/p>\n\n\n\n<p>By the end of this lab, students will create alerts for:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>High CPU Usage<\/li>\n\n\n\n<li>High Memory Usage<\/li>\n\n\n\n<li>High Disk Usage<\/li>\n\n\n\n<li>High Load Average<\/li>\n\n\n\n<li>Zombie Processes Detected<\/li>\n<\/ol>\n\n\n\n<p>Students will also learn:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>What an alert rule is<\/li>\n\n\n\n<li>What Reduce expression does<\/li>\n\n\n\n<li>What Threshold expression does<\/li>\n\n\n\n<li>How to configure evaluation interval<\/li>\n\n\n\n<li>How to configure contact points<\/li>\n\n\n\n<li>How to route alert notifications<\/li>\n\n\n\n<li>How to test alerts safely<\/li>\n\n\n\n<li>How to troubleshoot No Data and Error states<\/li>\n<\/ol>\n\n\n\n<p>3.1 Alerting Architecture<\/p>\n\n\n\n<p>Your current monitoring flow is:<\/p>\n\n\n\n<p>Linux Host \u2193 Telegraf \u2193 Graphite Carbon \u2193 Whisper Storage \u2193 Graphite Web \u2193 Grafana Datasource \u2193 Grafana Alert Rule \u2193 Contact Point \u2193 Email \/ Slack \/ Webhook \/ Teams<\/p>\n\n\n\n<p>In simple words:<\/p>\n\n\n\n<p>Telegraf collects metrics. Graphite stores metrics. Grafana queries Graphite. Grafana evaluates alert rules. Grafana sends notifications when rules fire.<\/p>\n\n\n\n<p>3.2 Important Alerting Concepts<\/p>\n\n\n\n<p>Alert Rule<\/p>\n\n\n\n<p>An alert rule is the main object in Grafana Alerting.<\/p>\n\n\n\n<p>It contains:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Query<\/li>\n\n\n\n<li>Expression<\/li>\n\n\n\n<li>Condition<\/li>\n\n\n\n<li>Evaluation interval<\/li>\n\n\n\n<li>Pending period<\/li>\n\n\n\n<li>Labels<\/li>\n\n\n\n<li>Notification settings<\/li>\n<\/ol>\n\n\n\n<p>Grafana supports alert rules that evaluate queries and expressions over time, and alert rules are the central component of the alerting system.<\/p>\n\n\n\n<p>Query<\/p>\n\n\n\n<p>The query gets data from Graphite.<\/p>\n\n\n\n<p>Correct example for this lab:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.usage_active<\/p>\n\n\n\n<p>This returns CPU active usage time-series data.<\/p>\n\n\n\n<p>Incorrect example for this lab:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.cpu-total.usage_active<\/p>\n\n\n\n<p>This does not work with your current collected metrics because cpu-total is not present in your Graphite metric path.<\/p>\n\n\n\n<p>Reduce Expression<\/p>\n\n\n\n<p>Grafana alerting cannot directly alert on a full time-series graph. It normally reduces a time series into a single value.<\/p>\n\n\n\n<p>Common reducer functions:<\/p>\n\n\n\n<p>ReducerMeaning LastMost recent value MeanAverage value over selected range MaxMaximum value over selected range MinMinimum value over selected range<\/p>\n\n\n\n<p>For most infrastructure alerts, use:<\/p>\n\n\n\n<p>Mean<\/p>\n\n\n\n<p>or:<\/p>\n\n\n\n<p>Last<\/p>\n\n\n\n<p>Recommended for this lab:<\/p>\n\n\n\n<p>MetricReducer CPUMean MemoryLast DiskLast LoadMean Zombie ProcessesLast<\/p>\n\n\n\n<p>Threshold Expression<\/p>\n\n\n\n<p>Threshold checks whether the reduced value crosses a limit.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<p>WHEN B IS ABOVE 80<\/p>\n\n\n\n<p>Meaning:<\/p>\n\n\n\n<p>If the reduced CPU value is above 80, alert should fire.<\/p>\n\n\n\n<p>Evaluation Group<\/p>\n\n\n\n<p>Evaluation group controls how often Grafana checks the alert.<\/p>\n\n\n\n<p>For this lab:<\/p>\n\n\n\n<p>Every 1 minute<\/p>\n\n\n\n<p>Pending Period<\/p>\n\n\n\n<p>Pending period controls how long the condition must remain true before firing.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<p>For 2 minutes<\/p>\n\n\n\n<p>Meaning:<\/p>\n\n\n\n<p>The condition must be true for 2 continuous minutes before Grafana sends alert.<\/p>\n\n\n\n<p>This prevents alerts from firing due to short temporary spikes.<\/p>\n\n\n\n<p>No Data and Error Handling<\/p>\n\n\n\n<p>Every alert rule should define what happens if Grafana gets no data or query errors.<\/p>\n\n\n\n<p>Recommended beginner setup:<\/p>\n\n\n\n<p>StateRecommended Setting No DataNo Data ErrorError<\/p>\n\n\n\n<p>For production, teams often tune this based on monitoring maturity.<\/p>\n\n\n\n<p>3.3 Recommended Alert Rules for This Lab<\/p>\n\n\n\n<p>Use these thresholds for demo and learning.<\/p>\n\n\n\n<p>Alert NameQueryWarning\/Critical Threshold High CPU Usagetelegraf.linux-demo.cpu.usage_activeAbove 80% High Memory Usagetelegraf.linux-demo.mem.used_percentAbove 85% High Disk Usagetelegraf.linux-demo.disk.used_percentAbove 85% High Load Averagetelegraf.linux-demo.system.load1Above 2 Zombie Processestelegraf.linux-demo.processes.zombiesAbove 0<\/p>\n\n\n\n<p>For student demos, you can temporarily lower thresholds to trigger alerts quickly.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<p>CPU threshold: 20% Memory threshold: 40% Disk threshold: slightly below current disk usage<\/p>\n\n\n\n<p>After testing, restore proper thresholds.<\/p>\n\n\n\n<p>3.4 Create a Contact Point<\/p>\n\n\n\n<p>Before creating alert rules, configure where Grafana should send notifications.<\/p>\n\n\n\n<p>Grafana contact points contain the configuration for sending alert notifications, such as email, Slack, webhook, Microsoft Teams, and other integrations.<\/p>\n\n\n\n<p>Steps<\/p>\n\n\n\n<p>Go to:<\/p>\n\n\n\n<p>Alerting \u2192 Contact points<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Add contact point<\/p>\n\n\n\n<p>Name:<\/p>\n\n\n\n<p>student-demo-email<\/p>\n\n\n\n<p>Integration type:<\/p>\n\n\n\n<p>Email<\/p>\n\n\n\n<p>Add email address:<\/p>\n\n\n\n<p><a href=\"mailto:student@example.com\">student@example.com<\/a><\/p>\n\n\n\n<p>For your own lab, use your real email.<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Test<\/p>\n\n\n\n<p>Then:<\/p>\n\n\n\n<p>Save contact point<\/p>\n\n\n\n<p>If Email Is Not Configured<\/p>\n\n\n\n<p>In many self-hosted Grafana setups, email will not work unless SMTP is configured.<\/p>\n\n\n\n<p>For learning, use a Webhook contact point instead.<\/p>\n\n\n\n<p>Example contact point:<\/p>\n\n\n\n<p>Name: student-demo-webhook Type: Webhook URL:&nbsp;<a href=\"https:\/\/webhook.site\/your-generated-url\">https:\/\/webhook.site\/your-generated-url<\/a><\/p>\n\n\n\n<p>You can open webhook.site, copy the unique URL, and paste it into Grafana.<\/p>\n\n\n\n<p>This is very useful for students because they can see alert payloads immediately without configuring SMTP.<\/p>\n\n\n\n<p>3.5 Create Notification Policy<\/p>\n\n\n\n<p>Notification policies decide how alerts are routed to contact points. Notification policies route alert instances to contact points using label matchers and can group multiple alerts together to reduce noise.<\/p>\n\n\n\n<p>For this beginner lab, keep it simple.<\/p>\n\n\n\n<p>Go to:<\/p>\n\n\n\n<p>Alerting \u2192 Notification policies<\/p>\n\n\n\n<p>Edit the default policy.<\/p>\n\n\n\n<p>Set default contact point:<\/p>\n\n\n\n<p>student-demo-email<\/p>\n\n\n\n<p>or:<\/p>\n\n\n\n<p>student-demo-webhook<\/p>\n\n\n\n<p>Save.<\/p>\n\n\n\n<p>This means all alerts without custom routing will go to that contact point.<\/p>\n\n\n\n<p>3.6 Create Folder for Alert Rules<\/p>\n\n\n\n<p>Go to:<\/p>\n\n\n\n<p>Dashboards \u2192 New folder<\/p>\n\n\n\n<p>Create folder:<\/p>\n\n\n\n<p>Linux Monitoring Alerts<\/p>\n\n\n\n<p>Why?<\/p>\n\n\n\n<p>Grafana-managed alert rules are usually stored under folders. This keeps alert rules organized.<\/p>\n\n\n\n<p>3.7 Alert Rule 1: High CPU Usage<\/p>\n\n\n\n<p>Purpose<\/p>\n\n\n\n<p>This alert fires when CPU usage remains high.<\/p>\n\n\n\n<p>Go to Alert Rules<\/p>\n\n\n\n<p>Navigate to:<\/p>\n\n\n\n<p>Alerting \u2192 Alert rules<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>New alert rule<\/p>\n\n\n\n<p>Step A: Rule Name<\/p>\n\n\n\n<p>Use:<\/p>\n\n\n\n<p>Linux &#8211; High CPU Usage<\/p>\n\n\n\n<p>Step B: Select Data Source Query<\/p>\n\n\n\n<p>In Query A, select datasource:<\/p>\n\n\n\n<p>Graphite<\/p>\n\n\n\n<p>Query:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.usage_active<\/p>\n\n\n\n<p>Optional display alias if using the query in Explore or dashboard:<\/p>\n\n\n\n<p>alias(telegraf.linux-demo.cpu.usage_active, &#8216;CPU Active %&#8217;)<\/p>\n\n\n\n<p>For alerting, the plain metric path is usually cleaner:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.usage_active<\/p>\n\n\n\n<p>Set relative time range:<\/p>\n\n\n\n<p>From: 5m To: now<\/p>\n\n\n\n<p>Step C: Add Reduce Expression<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Add expression<\/p>\n\n\n\n<p>Expression type:<\/p>\n\n\n\n<p>Reduce<\/p>\n\n\n\n<p>Input:<\/p>\n\n\n\n<p>A<\/p>\n\n\n\n<p>Function:<\/p>\n\n\n\n<p>Mean<\/p>\n\n\n\n<p>Name it:<\/p>\n\n\n\n<p>B<\/p>\n\n\n\n<p>Meaning:<\/p>\n\n\n\n<p>Grafana takes CPU usage data from the last 5 minutes and calculates the average.<\/p>\n\n\n\n<p>Step D: Add Threshold Expression<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Add expression<\/p>\n\n\n\n<p>Expression type:<\/p>\n\n\n\n<p>Threshold<\/p>\n\n\n\n<p>Input:<\/p>\n\n\n\n<p>B<\/p>\n\n\n\n<p>Condition:<\/p>\n\n\n\n<p>IS ABOVE 80<\/p>\n\n\n\n<p>Name it:<\/p>\n\n\n\n<p>C<\/p>\n\n\n\n<p>Set alert condition to:<\/p>\n\n\n\n<p>C<\/p>\n\n\n\n<p>Step E: Evaluation Behavior<\/p>\n\n\n\n<p>Set:<\/p>\n\n\n\n<p>Folder: Linux Monitoring Alerts Evaluation group: linux-alerts-1m Evaluate every: 1m Pending period: 2m<\/p>\n\n\n\n<p>Meaning:<\/p>\n\n\n\n<p>Grafana checks every 1 minute. If CPU average stays above 80% for 2 minutes, alert fires.<\/p>\n\n\n\n<p>Step F: Configure No Data and Error<\/p>\n\n\n\n<p>Use:<\/p>\n\n\n\n<p>No data state: No Data Error state: Error<\/p>\n\n\n\n<p>Step G: Labels<\/p>\n\n\n\n<p>Add labels:<\/p>\n\n\n\n<p>severity = warning team = training service = linux host = linux-demo metric = cpu<\/p>\n\n\n\n<p>Labels help route and organize alerts.<\/p>\n\n\n\n<p>Step H: Annotations<\/p>\n\n\n\n<p>Add summary:<\/p>\n\n\n\n<p>High CPU usage detected on linux-demo<\/p>\n\n\n\n<p>Add description:<\/p>\n\n\n\n<p>CPU active usage has been above 80% for more than 2 minutes. Check running processes, recent deployments, load average, and application workload.<\/p>\n\n\n\n<p>Add runbook URL if you have one:<\/p>\n\n\n\n<figure class=\"wp-block-embed\"><div class=\"wp-block-embed__wrapper\">\nhttps:\/\/example.com\/runbooks\/high-cpu\n<\/div><\/figure>\n\n\n\n<p>Step I: Save Rule<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Save rule and exit<\/p>\n\n\n\n<p>Test CPU Alert<\/p>\n\n\n\n<p>For demo, temporarily change threshold from:<\/p>\n\n\n\n<p>80<\/p>\n\n\n\n<p>to:<\/p>\n\n\n\n<p>20<\/p>\n\n\n\n<p>Then run:<\/p>\n\n\n\n<p>stress &#8211;cpu 2 &#8211;timeout 180<\/p>\n\n\n\n<p>Watch:<\/p>\n\n\n\n<p>Alerting \u2192 Alert rules<\/p>\n\n\n\n<p>Expected states:<\/p>\n\n\n\n<p>Normal \u2192 Pending \u2192 Firing<\/p>\n\n\n\n<p>After testing, restore threshold to:<\/p>\n\n\n\n<p>80<\/p>\n\n\n\n<p>3.8 Alert Rule 2: High Memory Usage<\/p>\n\n\n\n<p>Rule Name<\/p>\n\n\n\n<p>Linux &#8211; High Memory Usage<\/p>\n\n\n\n<p>Query A<\/p>\n\n\n\n<p>Datasource:<\/p>\n\n\n\n<p>Graphite<\/p>\n\n\n\n<p>Query:<\/p>\n\n\n\n<p>telegraf.linux-demo.mem.used_percent<\/p>\n\n\n\n<p>Optional display alias:<\/p>\n\n\n\n<p>alias(telegraf.linux-demo.mem.used_percent, &#8216;Memory Used %&#8217;)<\/p>\n\n\n\n<p>Relative time range:<\/p>\n\n\n\n<p>From: 5m To: now<\/p>\n\n\n\n<p>Reduce Expression B<\/p>\n\n\n\n<p>Type: Reduce Input: A Function: Last<\/p>\n\n\n\n<p>Why Last?<\/p>\n\n\n\n<p>Memory usage is already a stable gauge. The most recent value is usually meaningful.<\/p>\n\n\n\n<p>Threshold Expression C<\/p>\n\n\n\n<p>Type: Threshold Input: B Condition: IS ABOVE 85<\/p>\n\n\n\n<p>Evaluation<\/p>\n\n\n\n<p>Evaluate every: 1m Pending period: 3m<\/p>\n\n\n\n<p>Labels<\/p>\n\n\n\n<p>severity = warning team = training service = linux host = linux-demo metric = memory<\/p>\n\n\n\n<p>Annotations<\/p>\n\n\n\n<p>Summary:<\/p>\n\n\n\n<p>High memory usage detected on linux-demo<\/p>\n\n\n\n<p>Description:<\/p>\n\n\n\n<p>Memory usage is above 85%. Check memory-heavy processes, application memory growth, cache usage, and swap activity.<\/p>\n\n\n\n<p>Save<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Save rule and exit<\/p>\n\n\n\n<p>Optional Memory Test<\/p>\n\n\n\n<p>To test memory safely, avoid exhausting the server.<\/p>\n\n\n\n<p>Install stress if needed:<\/p>\n\n\n\n<p>apt install -y stress<\/p>\n\n\n\n<p>Run a small memory test:<\/p>\n\n\n\n<p>stress &#8211;vm 1 &#8211;vm-bytes 256M &#8211;timeout 120<\/p>\n\n\n\n<p>For small EC2 instances, reduce memory:<\/p>\n\n\n\n<p>stress &#8211;vm 1 &#8211;vm-bytes 128M &#8211;timeout 120<\/p>\n\n\n\n<p>For demo triggering, temporarily reduce threshold to a value slightly below current memory usage.<\/p>\n\n\n\n<p>3.9 Alert Rule 3: High Disk Usage<\/p>\n\n\n\n<p>Important Note<\/p>\n\n\n\n<p>In this specific lab, the current Graphite metric path does not include disk mountpoint as a separate node.<\/p>\n\n\n\n<p>Correct collected metric:<\/p>\n\n\n\n<p>telegraf.linux-demo.disk.used_percent<\/p>\n\n\n\n<p>Incorrect query for this lab:<\/p>\n\n\n\n<p>highestCurrent(telegraf.linux-demo.disk.*.used_percent, 1)<\/p>\n\n\n\n<p>Why the older query is incorrect:<\/p>\n\n\n\n<p>Your Graphite storage does not contain paths like:<\/p>\n\n\n\n<p>telegraf.linux-demo.disk.root.used_percent telegraf.linux-demo.disk.var.used_percent telegraf.linux-demo.disk.tmp.used_percent<\/p>\n\n\n\n<p>It contains:<\/p>\n\n\n\n<p>telegraf.linux-demo.disk.used_percent<\/p>\n\n\n\n<p>So alerting should use the exact collected metric:<\/p>\n\n\n\n<p>telegraf.linux-demo.disk.used_percent<\/p>\n\n\n\n<p>Rule Name<\/p>\n\n\n\n<p>Linux &#8211; High Disk Usage<\/p>\n\n\n\n<p>Query A<\/p>\n\n\n\n<p>Datasource:<\/p>\n\n\n\n<p>Graphite<\/p>\n\n\n\n<p>Query:<\/p>\n\n\n\n<p>telegraf.linux-demo.disk.used_percent<\/p>\n\n\n\n<p>Optional display alias:<\/p>\n\n\n\n<p>alias(telegraf.linux-demo.disk.used_percent, &#8216;Disk Used %&#8217;)<\/p>\n\n\n\n<p>Relative time range:<\/p>\n\n\n\n<p>From: 5m To: now<\/p>\n\n\n\n<p>Reduce Expression B<\/p>\n\n\n\n<p>Type: Reduce Input: A Function: Last<\/p>\n\n\n\n<p>Why Last?<\/p>\n\n\n\n<p>Disk usage percentage is a gauge. The latest value is usually the most useful value for alerting.<\/p>\n\n\n\n<p>Threshold Expression C<\/p>\n\n\n\n<p>Type: Threshold Input: B Condition: IS ABOVE 85<\/p>\n\n\n\n<p>Evaluation<\/p>\n\n\n\n<p>Evaluate every: 1m Pending period: 5m<\/p>\n\n\n\n<p>Disk issues usually do not need second-by-second alerting. A 5-minute pending period avoids noisy alerts.<\/p>\n\n\n\n<p>Labels<\/p>\n\n\n\n<p>severity = warning team = training service = linux host = linux-demo metric = disk<\/p>\n\n\n\n<p>Annotations<\/p>\n\n\n\n<p>Summary:<\/p>\n\n\n\n<p>High disk usage detected on linux-demo<\/p>\n\n\n\n<p>Description:<\/p>\n\n\n\n<p>Disk usage is above 85%. Check large files, logs, Docker images, temporary files, and application data growth.<\/p>\n\n\n\n<p>Save<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Save rule and exit<\/p>\n\n\n\n<p>Disk Alert Test<\/p>\n\n\n\n<p>Create a temporary file:<\/p>\n\n\n\n<p>dd if=\/dev\/zero of=\/tmp\/disk-alert-test.img bs=100M count=5<\/p>\n\n\n\n<p>Check dashboard.<\/p>\n\n\n\n<p>Remove after test:<\/p>\n\n\n\n<p>rm -f \/tmp\/disk-alert-test.img<\/p>\n\n\n\n<p>For a large disk, this may not trigger the alert. For demo, temporarily reduce threshold near current disk usage.<\/p>\n\n\n\n<p>3.10 Alert Rule 4: High Load Average<\/p>\n\n\n\n<p>Rule Name<\/p>\n\n\n\n<p>Linux &#8211; High Load Average<\/p>\n\n\n\n<p>Query A<\/p>\n\n\n\n<p>Datasource:<\/p>\n\n\n\n<p>Graphite<\/p>\n\n\n\n<p>Query:<\/p>\n\n\n\n<p>telegraf.linux-demo.system.load1<\/p>\n\n\n\n<p>Optional display alias:<\/p>\n\n\n\n<p>alias(telegraf.linux-demo.system.load1, &#8216;Load 1m&#8217;)<\/p>\n\n\n\n<p>Relative time range:<\/p>\n\n\n\n<p>From: 5m To: now<\/p>\n\n\n\n<p>Reduce Expression B<\/p>\n\n\n\n<p>Type: Reduce Input: A Function: Mean<\/p>\n\n\n\n<p>Threshold Expression C<\/p>\n\n\n\n<p>For demo, use:<\/p>\n\n\n\n<p>IS ABOVE 2<\/p>\n\n\n\n<p>For production, threshold depends on CPU count.<\/p>\n\n\n\n<p>Simple teaching rule:<\/p>\n\n\n\n<p>If load average is consistently higher than CPU core count, investigate.<\/p>\n\n\n\n<p>For a 2-vCPU system:<\/p>\n\n\n\n<p>Warning: above 2 Critical: above 4<\/p>\n\n\n\n<p>Evaluation<\/p>\n\n\n\n<p>Evaluate every: 1m Pending period: 3m<\/p>\n\n\n\n<p>Labels<\/p>\n\n\n\n<p>severity = warning team = training service = linux host = linux-demo metric = load<\/p>\n\n\n\n<p>Annotations<\/p>\n\n\n\n<p>Summary:<\/p>\n\n\n\n<p>High load average detected on linux-demo<\/p>\n\n\n\n<p>Description:<\/p>\n\n\n\n<p>1-minute load average is above the expected threshold. Check CPU usage, blocked processes, disk I\/O wait, and application workload.<\/p>\n\n\n\n<p>Save<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Save rule and exit<\/p>\n\n\n\n<p>Test Load Alert<\/p>\n\n\n\n<p>Run:<\/p>\n\n\n\n<p>stress &#8211;cpu 2 &#8211;timeout 180<\/p>\n\n\n\n<p>Watch the alert state.<\/p>\n\n\n\n<p>If it does not trigger, temporarily reduce threshold:<\/p>\n\n\n\n<p>IS ABOVE 0.5<\/p>\n\n\n\n<p>After testing, restore:<\/p>\n\n\n\n<p>IS ABOVE 2<\/p>\n\n\n\n<p>3.11 Alert Rule 5: Zombie Processes Detected<\/p>\n\n\n\n<p>Rule Name<\/p>\n\n\n\n<p>Linux &#8211; Zombie Processes Detected<\/p>\n\n\n\n<p>Query A<\/p>\n\n\n\n<p>Datasource:<\/p>\n\n\n\n<p>Graphite<\/p>\n\n\n\n<p>Query:<\/p>\n\n\n\n<p>telegraf.linux-demo.processes.zombies<\/p>\n\n\n\n<p>Optional display alias:<\/p>\n\n\n\n<p>alias(telegraf.linux-demo.processes.zombies, &#8216;Zombie Processes&#8217;)<\/p>\n\n\n\n<p>Relative time range:<\/p>\n\n\n\n<p>From: 5m To: now<\/p>\n\n\n\n<p>Reduce Expression B<\/p>\n\n\n\n<p>Type: Reduce Input: A Function: Last<\/p>\n\n\n\n<p>Threshold Expression C<\/p>\n\n\n\n<p>Type: Threshold Input: B Condition: IS ABOVE 0<\/p>\n\n\n\n<p>Meaning:<\/p>\n\n\n\n<p>If zombie process count is greater than 0, alert fires.<\/p>\n\n\n\n<p>Evaluation<\/p>\n\n\n\n<p>Evaluate every: 1m Pending period: 1m<\/p>\n\n\n\n<p>Labels<\/p>\n\n\n\n<p>severity = warning team = training service = linux host = linux-demo metric = process<\/p>\n\n\n\n<p>Annotations<\/p>\n\n\n\n<p>Summary:<\/p>\n\n\n\n<p>Zombie process detected on linux-demo<\/p>\n\n\n\n<p>Description:<\/p>\n\n\n\n<p>One or more zombie processes exist. Check parent processes and application process management.<\/p>\n\n\n\n<p>Save<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Save rule and exit<\/p>\n\n\n\n<p>3.12 Link Alert Rules to Dashboard Panels<\/p>\n\n\n\n<p>Grafana allows alert rules to be linked to dashboard panels so users can see alert state directly on panels.<\/p>\n\n\n\n<p>For each alert rule:<\/p>\n\n\n\n<p>Go to:<\/p>\n\n\n\n<p>Alerting \u2192 Alert rules<\/p>\n\n\n\n<p>Open the alert rule.<\/p>\n\n\n\n<p>Find:<\/p>\n\n\n\n<p>Configure notification message<\/p>\n\n\n\n<p>or dashboard\/panel link section.<\/p>\n\n\n\n<p>Click:<\/p>\n\n\n\n<p>Link dashboard and panel<\/p>\n\n\n\n<p>Select dashboard:<\/p>\n\n\n\n<p>Linux Monitoring &#8211; Graphite Telegraf<\/p>\n\n\n\n<p>Select matching panel:<\/p>\n\n\n\n<p>Alert RuleDashboard Panel Linux &#8211; High CPU UsageCPU Usage % Linux &#8211; High Memory UsageMemory Used % Linux &#8211; High Disk UsageDisk Usage % Linux &#8211; High Load AverageSystem Load Average Linux &#8211; Zombie Processes DetectedZombie Processes<\/p>\n\n\n\n<p>Save the rule.<\/p>\n\n\n\n<p>3.13 Verify Alert Rule State<\/p>\n\n\n\n<p>Go to:<\/p>\n\n\n\n<p>Alerting \u2192 Alert rules<\/p>\n\n\n\n<p>You should see rule states.<\/p>\n\n\n\n<p>Common states:<\/p>\n\n\n\n<p>StateMeaning NormalCondition is false PendingCondition is true but pending period has not completed FiringCondition is true and pending period completed No DataQuery returned no data ErrorQuery or datasource failed PausedRule evaluation is disabled<\/p>\n\n\n\n<p>Expected first state:<\/p>\n\n\n\n<p>Normal<\/p>\n\n\n\n<p>If you lowered thresholds for testing, you may see:<\/p>\n\n\n\n<p>Pending<\/p>\n\n\n\n<p>then:<\/p>\n\n\n\n<p>Firing<\/p>\n\n\n\n<p>3.14 Testing All Alerts Safely<\/p>\n\n\n\n<p>CPU Test<\/p>\n\n\n\n<p>stress &#8211;cpu 2 &#8211;timeout 180<\/p>\n\n\n\n<p>Expected:<\/p>\n\n\n\n<p>CPU alert may go Pending\/Firing if threshold is low enough.<\/p>\n\n\n\n<p>Memory Test<\/p>\n\n\n\n<p>stress &#8211;vm 1 &#8211;vm-bytes 256M &#8211;timeout 120<\/p>\n\n\n\n<p>For small instances:<\/p>\n\n\n\n<p>stress &#8211;vm 1 &#8211;vm-bytes 128M &#8211;timeout 120<\/p>\n\n\n\n<p>Expected:<\/p>\n\n\n\n<p>Memory panel increases. Memory alert may fire if threshold is low enough.<\/p>\n\n\n\n<p>Disk Test<\/p>\n\n\n\n<p>dd if=\/dev\/zero of=\/tmp\/disk-alert-test.img bs=100M count=5<\/p>\n\n\n\n<p>Remove test file:<\/p>\n\n\n\n<p>rm -f \/tmp\/disk-alert-test.img<\/p>\n\n\n\n<p>Expected:<\/p>\n\n\n\n<p>Disk usage may increase depending on disk size.<\/p>\n\n\n\n<p>Load Test<\/p>\n\n\n\n<p>stress &#8211;cpu 2 &#8211;timeout 180<\/p>\n\n\n\n<p>Expected:<\/p>\n\n\n\n<p>Load average increases gradually.<\/p>\n\n\n\n<p>Zombie Test<\/p>\n\n\n\n<p>Zombie process creation is not recommended for beginner labs unless you provide controlled code.<\/p>\n\n\n\n<p>For training, test this alert by temporarily changing threshold:<\/p>\n\n\n\n<p>IS ABOVE -1<\/p>\n\n\n\n<p>Because normal zombie count is usually 0, this makes the alert condition true.<\/p>\n\n\n\n<p>After testing, restore:<\/p>\n\n\n\n<p>IS ABOVE 0<\/p>\n\n\n\n<p>3.15 Recommended Alert Rule Summary Table<\/p>\n\n\n\n<p>Alert RuleQueryReducerThresholdEvaluatePending High CPU Usagetelegraf.linux-demo.cpu.usage_activeMean&gt; 801m2m High Memory Usagetelegraf.linux-demo.mem.used_percentLast&gt; 851m3m High Disk Usagetelegraf.linux-demo.disk.used_percentLast&gt; 851m5m High Load Averagetelegraf.linux-demo.system.load1Mean&gt; 21m3m Zombie Processestelegraf.linux-demo.processes.zombiesLast&gt; 01m1m<\/p>\n\n\n\n<p>3.16 Recommended Labels and Annotations<\/p>\n\n\n\n<p>Use consistent labels for all rules.<\/p>\n\n\n\n<p>Labels<\/p>\n\n\n\n<p>team = training service = linux host = linux-demo datasource = graphite<\/p>\n\n\n\n<p>Per alert:<\/p>\n\n\n\n<p>metric = cpu metric = memory metric = disk metric = load metric = process<\/p>\n\n\n\n<p>Severity:<\/p>\n\n\n\n<p>severity = warning<\/p>\n\n\n\n<p>For production, you can add:<\/p>\n\n\n\n<p>environment = demo<\/p>\n\n\n\n<p>or:<\/p>\n\n\n\n<p>environment = production<\/p>\n\n\n\n<p>Annotation Template<\/p>\n\n\n\n<p>Use this structure:<\/p>\n\n\n\n<p>summary = Short human-readable alert title description = What happened, why it matters, and first troubleshooting steps runbook_url = Link to internal runbook<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<p>summary = High CPU usage detected on linux-demo description = CPU active usage has been above 80% for more than 2 minutes. Check top processes, load average, recent deployments, and application traffic.<\/p>\n\n\n\n<p>3.17 Common Student Mistakes<\/p>\n\n\n\n<p>Mistake 1: Using old assumed CPU metric path<\/p>\n\n\n\n<p>Incorrect:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.cpu-total.usage_active<\/p>\n\n\n\n<p>Correct:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.usage_active<\/p>\n\n\n\n<p>Reason:<\/p>\n\n\n\n<p>Your current Graphite storage does not include cpu-total in the metric path.<\/p>\n\n\n\n<p>Mistake 2: Using old assumed disk wildcard path<\/p>\n\n\n\n<p>Incorrect:<\/p>\n\n\n\n<p>telegraf.linux-demo.disk.*.used_percent<\/p>\n\n\n\n<p>Incorrect for this lab:<\/p>\n\n\n\n<p>highestCurrent(telegraf.linux-demo.disk.*.used_percent, 1)<\/p>\n\n\n\n<p>Correct:<\/p>\n\n\n\n<p>telegraf.linux-demo.disk.used_percent<\/p>\n\n\n\n<p>Reason:<\/p>\n\n\n\n<p>Your current Graphite storage does not include mountpoint as a separate path node.<\/p>\n\n\n\n<p>Mistake 3: Forgetting Reduce expression<\/p>\n\n\n\n<p>Grafana needs a single value for threshold comparison.<\/p>\n\n\n\n<p>Correct flow:<\/p>\n\n\n\n<p>A = Graphite query B = Reduce A C = Threshold B Condition = C<\/p>\n\n\n\n<p>Mistake 4: Wrong datasource<\/p>\n\n\n\n<p>Make sure Query A uses:<\/p>\n\n\n\n<p>Graphite<\/p>\n\n\n\n<p>not Prometheus, Loki, or TestData.<\/p>\n\n\n\n<p>Mistake 5: Wrong metric path<\/p>\n\n\n\n<p>Check in Explore first:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.usage_active<\/p>\n\n\n\n<p>If no data, discover with:<\/p>\n\n\n\n<p>telegraf.*<\/p>\n\n\n\n<p>Or run this command on Graphite server:<\/p>\n\n\n\n<p>docker exec -it graphite find \/opt\/graphite\/storage\/whisper\/telegraf -type f | sed &#8216;s#\/opt\/graphite\/storage\/whisper\/##; s#\/#.#g; s#.wsp$##&#8217; | sort<\/p>\n\n\n\n<p>Mistake 6: Time range too short<\/p>\n\n\n\n<p>If query range is too short, Grafana may not get enough datapoints.<\/p>\n\n\n\n<p>Use:<\/p>\n\n\n\n<p>From: 5m To: now<\/p>\n\n\n\n<p>Mistake 7: Expecting notification without contact point<\/p>\n\n\n\n<p>Alert rule can fire but no message will be received if contact point\/notification policy is not configured.<\/p>\n\n\n\n<p>Check:<\/p>\n\n\n\n<p>Alerting \u2192 Contact points Alerting \u2192 Notification policies<\/p>\n\n\n\n<p>3.18 Troubleshooting Guide<\/p>\n\n\n\n<p>Problem: Alert rule shows No Data<\/p>\n\n\n\n<p>Check Graphite query in Explore:<\/p>\n\n\n\n<p>telegraf.linux-demo.cpu.usage_active<\/p>\n\n\n\n<p>Check Graphite directly:<\/p>\n\n\n\n<p>curl &#8220;<a href=\"http:\/\/localhost:8080\/render?target=telegraf.linux-demo.cpu.usage_active&amp;from=-10min&amp;format=json\">http:\/\/localhost:8080\/render?target=telegraf.linux-demo.cpu.usage_active&amp;from=-10min&amp;format=json<\/a>&#8220;<\/p>\n\n\n\n<p>Check Telegraf logs:<\/p>\n\n\n\n<p>docker logs -f telegraf<\/p>\n\n\n\n<p>Check Graphite files:<\/p>\n\n\n\n<p>docker exec -it graphite find \/opt\/graphite\/storage\/whisper\/telegraf -type f | head<\/p>\n\n\n\n<p>Problem: Alert stays Normal during test<\/p>\n\n\n\n<p>Check these:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Is threshold too high?<\/li>\n\n\n\n<li>Is pending period too long?<\/li>\n\n\n\n<li>Is query returning expected value?<\/li>\n\n\n\n<li>Is time range set to 5m?<\/li>\n\n\n\n<li>Did you save the rule?<\/li>\n<\/ol>\n\n\n\n<p>For demo, temporarily lower threshold.<\/p>\n\n\n\n<p>Problem: Alert fires but no notification arrives<\/p>\n\n\n\n<p>Check:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Contact point test works<\/li>\n\n\n\n<li>Notification policy routes to contact point<\/li>\n\n\n\n<li>Alert has not been muted<\/li>\n\n\n\n<li>Alert is not grouped\/delayed<\/li>\n\n\n\n<li>SMTP\/webhook\/Slack configuration is valid<\/li>\n<\/ol>\n\n\n\n<p>Problem: Disk alert does not show mount name<\/p>\n\n\n\n<p>In this lab, disk mountpoint is not available in the Graphite metric path because of the current Telegraf Graphite template.<\/p>\n\n\n\n<p>Current available disk alert query:<\/p>\n\n\n\n<p>telegraf.linux-demo.disk.used_percent<\/p>\n\n\n\n<p>If you want mount-level disk metrics later, change the Telegraf Graphite output template to include mountpoint tags and regenerate metrics. But for this lab, use the actual collected metric path above.<\/p>\n\n\n\n<p>3.19 Lab Completion Checklist<\/p>\n\n\n\n<p>Students should complete:<\/p>\n\n\n\n<p>TaskCompleted Created contact point\u2610 Tested contact point\u2610 Configured notification policy\u2610 Created alert folder\u2610 Created CPU alert\u2610 Created memory alert\u2610 Created disk alert\u2610 Created load alert\u2610 Created zombie process alert\u2610 Added labels\u2610 Added annotations\u2610 Linked alerts to dashboard panels\u2610 Tested at least one alert\u2610 Restored production-like thresholds after testing\u2610<\/p>\n\n\n\n<p>3.20 Final Student Summary<\/p>\n\n\n\n<p>At the end of this lab, students should understand:<\/p>\n\n\n\n<p>Grafana can create alert rules from Graphite metrics. A good alert rule uses Query \u2192 Reduce \u2192 Threshold. Evaluation interval controls how often the rule runs. Pending period prevents noisy alerts from short spikes. Contact points define where notifications are sent. Notification policies define how alerts are routed. Labels help organize and route alerts. Annotations make alerts understandable for humans. Alert rules should be tested safely before production use. Metric paths must match the real Graphite storage structure. In this lab, the correct metric pattern is telegraf.linux-demo&#8230;<\/p>\n\n\n\n<p>The final alerting setup is:<\/p>\n\n\n\n<p>Graphite metric \u2193 Grafana alert query \u2193 Reduce expression \u2193 Threshold expression \u2193 Evaluation group \u2193 Alert state \u2193 Notification policy \u2193 Contact point<\/p>\n\n\n\n<p>This completes the full student workflow:<\/p>\n\n\n\n<p>Step 1: Explore metrics Step 2: Build dashboard Step 3: Create alerts<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Below is Step 3 only: a deep, student-ready lab guide for creating Grafana 13.x alerts using Graphite + Telegraf Linux [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2240","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Grafana Lab with Graphite Datasource metrics \u2013 Alerts - SRE School<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Grafana Lab with Graphite Datasource metrics \u2013 Alerts - SRE School\" \/>\n<meta property=\"og:description\" content=\"Below is Step 3 only: a deep, student-ready lab guide for creating Grafana 13.x alerts using Graphite + Telegraf Linux [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/\" \/>\n<meta property=\"og:site_name\" content=\"SRE School\" \/>\n<meta property=\"article:published_time\" content=\"2026-04-24T08:30:45+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-05T07:27:33+00:00\" \/>\n<meta name=\"author\" content=\"Rajesh Kumar\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rajesh Kumar\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/\",\"url\":\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/\",\"name\":\"Grafana Lab with Graphite Datasource metrics \u2013 Alerts - SRE School\",\"isPartOf\":{\"@id\":\"https:\/\/sreschool.com\/blog\/#website\"},\"datePublished\":\"2026-04-24T08:30:45+00:00\",\"dateModified\":\"2026-05-05T07:27:33+00:00\",\"author\":{\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201\"},\"breadcrumb\":{\"@id\":\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/#breadcrumb\"},\"inLanguage\":\"en\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/sreschool.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Grafana Lab with Graphite Datasource metrics \u2013 Alerts\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/sreschool.com\/blog\/#website\",\"url\":\"https:\/\/sreschool.com\/blog\/\",\"name\":\"SRESchool\",\"description\":\"Master SRE. Build Resilient Systems. Lead the Future of Reliability\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/sreschool.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201\",\"name\":\"Rajesh Kumar\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en\",\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g\",\"caption\":\"Rajesh Kumar\"},\"sameAs\":[\"http:\/\/sreschool.com\/blog\"],\"url\":\"https:\/\/sreschool.com\/blog\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Grafana Lab with Graphite Datasource metrics \u2013 Alerts - SRE School","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/","og_locale":"en_US","og_type":"article","og_title":"Grafana Lab with Graphite Datasource metrics \u2013 Alerts - SRE School","og_description":"Below is Step 3 only: a deep, student-ready lab guide for creating Grafana 13.x alerts using Graphite + Telegraf Linux [&hellip;]","og_url":"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/","og_site_name":"SRE School","article_published_time":"2026-04-24T08:30:45+00:00","article_modified_time":"2026-05-05T07:27:33+00:00","author":"Rajesh Kumar","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Rajesh Kumar","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/","url":"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/","name":"Grafana Lab with Graphite Datasource metrics \u2013 Alerts - SRE School","isPartOf":{"@id":"https:\/\/sreschool.com\/blog\/#website"},"datePublished":"2026-04-24T08:30:45+00:00","dateModified":"2026-05-05T07:27:33+00:00","author":{"@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201"},"breadcrumb":{"@id":"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/#breadcrumb"},"inLanguage":"en","potentialAction":[{"@type":"ReadAction","target":["https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/sreschool.com\/blog\/grafana-lab-with-graphite-datasource-metrics-alerts\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/sreschool.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Grafana Lab with Graphite Datasource metrics \u2013 Alerts"}]},{"@type":"WebSite","@id":"https:\/\/sreschool.com\/blog\/#website","url":"https:\/\/sreschool.com\/blog\/","name":"SRESchool","description":"Master SRE. Build Resilient Systems. Lead the Future of Reliability","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/sreschool.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en"},{"@type":"Person","@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201","name":"Rajesh Kumar","image":{"@type":"ImageObject","inLanguage":"en","@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g","caption":"Rajesh Kumar"},"sameAs":["http:\/\/sreschool.com\/blog"],"url":"https:\/\/sreschool.com\/blog\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/2240","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/comments?post=2240"}],"version-history":[{"count":3,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/2240\/revisions"}],"predecessor-version":[{"id":2265,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/2240\/revisions\/2265"}],"wp:attachment":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/media?parent=2240"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/categories?post=2240"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/tags?post=2240"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}