From 5558b3d1b166a03bd0626712765f6cfbe7d8b027 Mon Sep 17 00:00:00 2001 From: Patrick de Ruiter Date: Sat, 1 Nov 2025 10:41:39 +0100 Subject: [PATCH] Initial commit with README and module files --- .gitignore | 0 .terraform.lock.hcl | 0 README.md | 114 +++++++++++++++++++++++++++++++++++++++++--- data.tf | 0 example/main.tf | 0 main.tf | 0 monitors.tf | 0 outputs.tf | 0 provider.tf | 0 synthetics.tf | 0 terraform.tfvars | 0 variables.tf | 0 12 files changed, 107 insertions(+), 7 deletions(-) mode change 100644 => 100755 .gitignore mode change 100644 => 100755 .terraform.lock.hcl mode change 100644 => 100755 README.md mode change 100644 => 100755 data.tf mode change 100644 => 100755 example/main.tf mode change 100644 => 100755 main.tf mode change 100644 => 100755 monitors.tf mode change 100644 => 100755 outputs.tf mode change 100644 => 100755 provider.tf mode change 100644 => 100755 synthetics.tf mode change 100644 => 100755 terraform.tfvars mode change 100644 => 100755 variables.tf diff --git a/.gitignore b/.gitignore old mode 100644 new mode 100755 diff --git a/.terraform.lock.hcl b/.terraform.lock.hcl old mode 100644 new mode 100755 diff --git a/README.md b/README.md old mode 100644 new mode 100755 index d9fa811..8e7b6a8 --- a/README.md +++ b/README.md @@ -1,12 +1,112 @@ -# Terraform Datadog Application Dashboard +# Terraform Datadog App Dashboard Module -This Terraform module configures a dashboard for your Kubernetes application. +## Overview -How-to set it up to work with your application is described below +This Terraform module creates a comprehensive Kubernetes/Docker application monitoring dashboard in Datadog with CPU and memory utilization metrics, health monitoring, and synthetic API testing. -# Setup +## Features -First you need to copy the contents of the example directory to a directory +- **Kubernetes Resource Monitoring**: Visualizes pod and node resource utilization +- **CPU & Memory Tracking**: Top 10 containers by CPU and memory usage +- **Health Monitoring**: Pod health metric alerts +- **Synthetic Testing**: HTTP API endpoint monitoring +- **Read-Only Dashboard**: Prevents accidental modifications in the UI -module "your_application_name" - source = git:// +## Resources Created + +- `datadog_dashboard`: Application monitoring dashboard with multiple widgets +- `datadog_monitor`: Kubernetes Pod Health metric alert +- `datadog_synthetics_test`: API HTTP health check + +## Dashboard Widgets + +1. **Kubernetes Pods Hostmap**: CPU utilization by Docker image +2. **CPU Utilization Timeseries**: Top 10 containers ranked by CPU usage +3. **Kubernetes Nodes Hostmap**: CPU utilization by host +4. **Memory Utilization Timeseries**: Top 10 containers ranked by memory usage +5. **Alert Graph**: Visualization from the pod health monitor + +## Requirements + +| Name | Version | +|------|---------| +| terraform | >= 0.12 | +| datadog | >= 3.2.0 | + +## Usage + +```hcl +module "app_dashboard" { + source = "./terraform-datadog-app-dashboard" + + app_namespace = "production" + cfa_name = "my-cfa" + app_name = "my-application" + team_name = "platform-team" + image_name = "my-app" + region = "eu-west-1" + stage = "prd" + url = "https://api.example.com/health" +} +``` + +## Inputs + +| Name | Description | Type | Required | +|------|-------------|------|----------| +| `app_namespace` | Namespace that the application runs in | `string` | yes | +| `cfa_name` | Name of the CFA | `string` | yes | +| `app_name` | Name of the application | `string` | yes | +| `team_name` | Name of the responsible team | `string` | yes | +| `image_name` | Name of the Docker Image | `string` | yes | +| `region` | AWS region where resources are located | `string` | yes | +| `stage` | Stage to monitor (dev, tst, stg, prd) | `string` | yes | +| `url` | URL for Datadog Synthetics to monitor | `string` | yes | + +## Outputs + +Currently, this module does not export any outputs. + +## Monitor Configuration + +### Pod Health Monitor + +- **Query**: `avg(last_5m):sum:docker.containers.running{image_name:{image_name}} by {docker_image}.rollup(avg, 60) <= 1` +- **Type**: Metric alert +- **Thresholds**: + - OK: 3 containers + - Warning: 2 containers + - Critical: 1 container +- **No Data**: Alert after 10 minutes + +### Synthetic API Test + +- **Type**: HTTP API test +- **Method**: GET +- **Interval**: Every 15 minutes (900 seconds) +- **Assertion**: HTTP status code is 200 +- **Locations**: AWS EU and US regions + +## Tagging Strategy + +All resources are tagged with: +- `cfa:{cfa_name}` +- `team:{team_name}` +- `app:{app_name}` +- `env:{stage}` +- `type:kubernetes` or `type:synthetics` + +## Notes + +- The dashboard is set to read-only mode to prevent accidental modifications +- Metrics are filtered by the `image_name` variable +- Synthetic tests run from multiple AWS regions for geographic coverage +- Pod health monitor uses a 1-minute rollup average + +## License + +Internal use only - Sanoma/WeBuildYourCloud + +## Authors + +Created and maintained by the Platform Engineering team. diff --git a/data.tf b/data.tf old mode 100644 new mode 100755 diff --git a/example/main.tf b/example/main.tf old mode 100644 new mode 100755 diff --git a/main.tf b/main.tf old mode 100644 new mode 100755 diff --git a/monitors.tf b/monitors.tf old mode 100644 new mode 100755 diff --git a/outputs.tf b/outputs.tf old mode 100644 new mode 100755 diff --git a/provider.tf b/provider.tf old mode 100644 new mode 100755 diff --git a/synthetics.tf b/synthetics.tf old mode 100644 new mode 100755 diff --git a/terraform.tfvars b/terraform.tfvars old mode 100644 new mode 100755 diff --git a/variables.tf b/variables.tf old mode 100644 new mode 100755