This Terraform module creates a comprehensive Kubernetes/Docker application monitoring dashboard in Datadog with CPU and memory utilization metrics, health monitoring, and synthetic API testing.

Features

Kubernetes Resource Monitoring: Visualizes pod and node resource utilization
CPU & Memory Tracking: Top 10 containers by CPU and memory usage
Health Monitoring: Pod health metric alerts
Synthetic Testing: HTTP API endpoint monitoring
Read-Only Dashboard: Prevents accidental modifications in the UI

Resources Created

datadog_dashboard: Application monitoring dashboard with multiple widgets
datadog_monitor: Kubernetes Pod Health metric alert
datadog_synthetics_test: API HTTP health check

Dashboard Widgets

Kubernetes Pods Hostmap: CPU utilization by Docker image
CPU Utilization Timeseries: Top 10 containers ranked by CPU usage
Kubernetes Nodes Hostmap: CPU utilization by host
Memory Utilization Timeseries: Top 10 containers ranked by memory usage
Alert Graph: Visualization from the pod health monitor

Requirements

Name	Version
terraform	>= 0.12
datadog	>= 3.2.0

Usage

module "app_dashboard" {
  source = "./terraform-datadog-app-dashboard"

  app_namespace = "production"
  cfa_name      = "my-cfa"
  app_name      = "my-application"
  team_name     = "platform-team"
  image_name    = "my-app"
  region        = "eu-west-1"
  stage         = "prd"
  url           = "https://api.example.com/health"
}

Inputs

Name	Description	Type	Required
`app_namespace`	Namespace that the application runs in	`string`	yes
`cfa_name`	Name of the CFA	`string`	yes
`app_name`	Name of the application	`string`	yes
`team_name`	Name of the responsible team	`string`	yes
`image_name`	Name of the Docker Image	`string`	yes
`region`	AWS region where resources are located	`string`	yes
`stage`	Stage to monitor (dev, tst, stg, prd)	`string`	yes
`url`	URL for Datadog Synthetics to monitor	`string`	yes

Outputs

Currently, this module does not export any outputs.

Monitor Configuration

Pod Health Monitor

Query: avg(last_5m):sum:docker.containers.running{image_name:{image_name}} by {docker_image}.rollup(avg, 60) <= 1
Type: Metric alert
Thresholds:
- OK: 3 containers
- Warning: 2 containers
- Critical: 1 container
No Data: Alert after 10 minutes

Synthetic API Test

Type: HTTP API test
Method: GET
Interval: Every 15 minutes (900 seconds)
Assertion: HTTP status code is 200
Locations: AWS EU and US regions

Tagging Strategy

All resources are tagged with:

cfa:{cfa_name}
team:{team_name}
app:{app_name}
env:{stage}
type:kubernetes or type:synthetics

Notes

The dashboard is set to read-only mode to prevent accidental modifications
Metrics are filtered by the image_name variable
Synthetic tests run from multiple AWS regions for geographic coverage
Pod health monitor uses a 1-minute rollup average

License

Internal use only - Sanoma/WeBuildYourCloud

Authors

Created and maintained by the Platform Engineering team.