3.8 KiB
3.8 KiB
Terraform Datadog Monitors Module
Overview
This Terraform module creates basic host metrics monitors for CPU and disk usage with accompanying visualization timeboard in Datadog.
Features
- CPU Monitoring: Track EC2 instance CPU utilization
- Disk Monitoring: Monitor disk usage across hosts
- Automated Alerting: No-data notifications included
- Visualization: Read-only timeboard with alert thresholds
- Configurable Thresholds: Customizable warning and critical levels
Resources Created
datadog_monitor(disk_usage): Metric alert for disk usagedatadog_monitor(cpu_usage): Query alert for CPU usagedatadog_timeboard(host_metrics): Read-only visualization dashboard
Requirements
| Name | Version |
|---|---|
| terraform | >= 0.12 |
| datadog | >= 3.2.0 |
Usage
module "datadog_monitors" {
source = "./terraform-datadog-monitors"
datadog_api_key = var.datadog_api_key
datadog_app_key = var.datadog_app_key
api_url = "https://api.datadoghq.eu"
disk_usage = {
query = "max:system.disk.in_use"
threshold = "85"
}
cpu_usage = {
query = "avg:aws.ec2.cpuutilization"
threshold = "85"
}
}
Inputs
| Name | Description | Type | Required | Default |
|---|---|---|---|---|
datadog_api_key |
Datadog API key | string |
yes | - |
datadog_app_key |
Datadog APP key | string |
yes | - |
api_url |
API endpoint | string |
no | "https://api.datadoghq.eu" |
http_client_retry_enabled |
Enable request retries (429, 5xx) | bool |
no | true |
http_client_retry_timeout |
HTTP retry timeout | string |
no | "" |
validate |
Validate API/APP keys on init | bool |
no | true |
disk_usage |
Query and threshold for disk monitor | map |
no | See default |
cpu_usage |
Query and threshold for CPU monitor | map |
no | See default |
datadog_alert_footer |
Alert message footer | string |
no | PagerDuty + Slack template |
trigger_by |
Grouping for alerts | string |
no | "{host,env}" |
Monitor Configuration
Disk Usage Monitor
- Query:
avg(last_5m):max:system.disk.in_use{*} by {host,env} * 100 > 85 - Type: Metric alert
- Threshold: 85% (configurable)
- Evaluation: Last 5 minutes average
- Grouping: By host and env
- No Data: Notifies after 10 minutes
CPU Usage Monitor
- Query:
avg(last_5m):avg:aws.ec2.cpuutilization{*} by {host,env} > 85 - Type: Query alert
- Threshold: 85% (configurable)
- Evaluation: Last 5 minutes average
- Grouping: By host and env
- No Data: Notifies after 10 minutes
Timeboard
The module creates a read-only timeboard with:
- CPU usage graph with alert threshold marker
- Disk usage graph with alert threshold marker
- Alert overlay showing when thresholds are breached
Alert Message Template
Default alert footer includes integration with:
- PagerDuty:
@pagerduty-service_name - Slack:
@slack-channel_name
Customize via the datadog_alert_footer variable.
Outputs
Currently, this module does not export any outputs.
Customization
Custom Thresholds
disk_usage = {
query = "max:system.disk.in_use"
threshold = "90" # Raise to 90%
}
cpu_usage = {
query = "avg:aws.ec2.cpuutilization"
threshold = "75" # Lower to 75%
}
Custom Grouping
trigger_by = "{host,env,service}"
Notes
- Monitors include no-data alerting by default
- Timeboard is read-only to prevent accidental modifications
- Uses 5-minute evaluation windows
- Supports HTTP client retries for reliability
- Can be reused across multiple environments via variable configuration
License
Internal use only - Sanoma/WeBuildYourCloud
Authors
Created and maintained by the Platform Engineering team.