Terraform Datadog App Dashboard Module

Overview

This Terraform module creates a comprehensive Kubernetes/Docker application monitoring dashboard in Datadog with CPU and memory utilization metrics, health monitoring, and synthetic API testing.

Features

  • Kubernetes Resource Monitoring: Visualizes pod and node resource utilization
  • CPU & Memory Tracking: Top 10 containers by CPU and memory usage
  • Health Monitoring: Pod health metric alerts
  • Synthetic Testing: HTTP API endpoint monitoring
  • Read-Only Dashboard: Prevents accidental modifications in the UI

Resources Created

  • datadog_dashboard: Application monitoring dashboard with multiple widgets
  • datadog_monitor: Kubernetes Pod Health metric alert
  • datadog_synthetics_test: API HTTP health check

Dashboard Widgets

  1. Kubernetes Pods Hostmap: CPU utilization by Docker image
  2. CPU Utilization Timeseries: Top 10 containers ranked by CPU usage
  3. Kubernetes Nodes Hostmap: CPU utilization by host
  4. Memory Utilization Timeseries: Top 10 containers ranked by memory usage
  5. Alert Graph: Visualization from the pod health monitor

Requirements

Name Version
terraform >= 0.12
datadog >= 3.2.0

Usage

module "app_dashboard" {
  source = "./terraform-datadog-app-dashboard"

  app_namespace = "production"
  cfa_name      = "my-cfa"
  app_name      = "my-application"
  team_name     = "platform-team"
  image_name    = "my-app"
  region        = "eu-west-1"
  stage         = "prd"
  url           = "https://api.example.com/health"
}

Inputs

Name Description Type Required
app_namespace Namespace that the application runs in string yes
cfa_name Name of the CFA string yes
app_name Name of the application string yes
team_name Name of the responsible team string yes
image_name Name of the Docker Image string yes
region AWS region where resources are located string yes
stage Stage to monitor (dev, tst, stg, prd) string yes
url URL for Datadog Synthetics to monitor string yes

Outputs

Currently, this module does not export any outputs.

Monitor Configuration

Pod Health Monitor

  • Query: avg(last_5m):sum:docker.containers.running{image_name:{image_name}} by {docker_image}.rollup(avg, 60) <= 1
  • Type: Metric alert
  • Thresholds:
    • OK: 3 containers
    • Warning: 2 containers
    • Critical: 1 container
  • No Data: Alert after 10 minutes

Synthetic API Test

  • Type: HTTP API test
  • Method: GET
  • Interval: Every 15 minutes (900 seconds)
  • Assertion: HTTP status code is 200
  • Locations: AWS EU and US regions

Tagging Strategy

All resources are tagged with:

  • cfa:{cfa_name}
  • team:{team_name}
  • app:{app_name}
  • env:{stage}
  • type:kubernetes or type:synthetics

Notes

  • The dashboard is set to read-only mode to prevent accidental modifications
  • Metrics are filtered by the image_name variable
  • Synthetic tests run from multiple AWS regions for geographic coverage
  • Pod health monitor uses a 1-minute rollup average

License

Internal use only - Sanoma/WeBuildYourCloud

Authors

Created and maintained by the Platform Engineering team.

Description
Terraform module for creating Kubernetes/Docker application monitoring dashboards in Datadog with CPU, memory, health monitoring, and synthetic API testing
Readme 31 KiB
Languages
HCL 100%