Terraform Datadog Monitors Module
Overview
This Terraform module creates basic host metrics monitors for CPU and disk usage with accompanying visualization timeboard in Datadog.
Features
- CPU Monitoring: Track EC2 instance CPU utilization
- Disk Monitoring: Monitor disk usage across hosts
- Automated Alerting: No-data notifications included
- Visualization: Read-only timeboard with alert thresholds
- Configurable Thresholds: Customizable warning and critical levels
Resources Created
datadog_monitor(disk_usage): Metric alert for disk usagedatadog_monitor(cpu_usage): Query alert for CPU usagedatadog_timeboard(host_metrics): Read-only visualization dashboard
Requirements
| Name | Version |
|---|---|
| terraform | >= 0.12 |
| datadog | >= 3.2.0 |
Usage
module "datadog_monitors" {
source = "./terraform-datadog-monitors"
datadog_api_key = var.datadog_api_key
datadog_app_key = var.datadog_app_key
api_url = "https://api.datadoghq.eu"
disk_usage = {
query = "max:system.disk.in_use"
threshold = "85"
}
cpu_usage = {
query = "avg:aws.ec2.cpuutilization"
threshold = "85"
}
}
Inputs
| Name | Description | Type | Required | Default |
|---|---|---|---|---|
datadog_api_key |
Datadog API key | string |
yes | - |
datadog_app_key |
Datadog APP key | string |
yes | - |
api_url |
API endpoint | string |
no | "https://api.datadoghq.eu" |
http_client_retry_enabled |
Enable request retries (429, 5xx) | bool |
no | true |
http_client_retry_timeout |
HTTP retry timeout | string |
no | "" |
validate |
Validate API/APP keys on init | bool |
no | true |
disk_usage |
Query and threshold for disk monitor | map |
no | See default |
cpu_usage |
Query and threshold for CPU monitor | map |
no | See default |
datadog_alert_footer |
Alert message footer | string |
no | PagerDuty + Slack template |
trigger_by |
Grouping for alerts | string |
no | "{host,env}" |
Monitor Configuration
Disk Usage Monitor
- Query:
avg(last_5m):max:system.disk.in_use{*} by {host,env} * 100 > 85 - Type: Metric alert
- Threshold: 85% (configurable)
- Evaluation: Last 5 minutes average
- Grouping: By host and env
- No Data: Notifies after 10 minutes
CPU Usage Monitor
- Query:
avg(last_5m):avg:aws.ec2.cpuutilization{*} by {host,env} > 85 - Type: Query alert
- Threshold: 85% (configurable)
- Evaluation: Last 5 minutes average
- Grouping: By host and env
- No Data: Notifies after 10 minutes
Timeboard
The module creates a read-only timeboard with:
- CPU usage graph with alert threshold marker
- Disk usage graph with alert threshold marker
- Alert overlay showing when thresholds are breached
Alert Message Template
Default alert footer includes integration with:
- PagerDuty:
@pagerduty-service_name - Slack:
@slack-channel_name
Customize via the datadog_alert_footer variable.
Outputs
Currently, this module does not export any outputs.
Customization
Custom Thresholds
disk_usage = {
query = "max:system.disk.in_use"
threshold = "90" # Raise to 90%
}
cpu_usage = {
query = "avg:aws.ec2.cpuutilization"
threshold = "75" # Lower to 75%
}
Custom Grouping
trigger_by = "{host,env,service}"
Notes
- Monitors include no-data alerting by default
- Timeboard is read-only to prevent accidental modifications
- Uses 5-minute evaluation windows
- Supports HTTP client retries for reliability
- Can be reused across multiple environments via variable configuration
License
Internal use only - Sanoma/WeBuildYourCloud
Authors
Created and maintained by the Platform Engineering team.
Description
Terraform module for creating basic host metrics monitors (CPU and disk usage) with visualization timeboards in Datadog
Languages
HCL
100%