113 lines
3.3 KiB
Markdown
Executable File
113 lines
3.3 KiB
Markdown
Executable File
# Terraform Datadog App Dashboard Module
|
|
|
|
## Overview
|
|
|
|
This Terraform module creates a comprehensive Kubernetes/Docker application monitoring dashboard in Datadog with CPU and memory utilization metrics, health monitoring, and synthetic API testing.
|
|
|
|
## Features
|
|
|
|
- **Kubernetes Resource Monitoring**: Visualizes pod and node resource utilization
|
|
- **CPU & Memory Tracking**: Top 10 containers by CPU and memory usage
|
|
- **Health Monitoring**: Pod health metric alerts
|
|
- **Synthetic Testing**: HTTP API endpoint monitoring
|
|
- **Read-Only Dashboard**: Prevents accidental modifications in the UI
|
|
|
|
## Resources Created
|
|
|
|
- `datadog_dashboard`: Application monitoring dashboard with multiple widgets
|
|
- `datadog_monitor`: Kubernetes Pod Health metric alert
|
|
- `datadog_synthetics_test`: API HTTP health check
|
|
|
|
## Dashboard Widgets
|
|
|
|
1. **Kubernetes Pods Hostmap**: CPU utilization by Docker image
|
|
2. **CPU Utilization Timeseries**: Top 10 containers ranked by CPU usage
|
|
3. **Kubernetes Nodes Hostmap**: CPU utilization by host
|
|
4. **Memory Utilization Timeseries**: Top 10 containers ranked by memory usage
|
|
5. **Alert Graph**: Visualization from the pod health monitor
|
|
|
|
## Requirements
|
|
|
|
| Name | Version |
|
|
|------|---------|
|
|
| terraform | >= 0.12 |
|
|
| datadog | >= 3.2.0 |
|
|
|
|
## Usage
|
|
|
|
```hcl
|
|
module "app_dashboard" {
|
|
source = "./terraform-datadog-app-dashboard"
|
|
|
|
app_namespace = "production"
|
|
cfa_name = "my-cfa"
|
|
app_name = "my-application"
|
|
team_name = "platform-team"
|
|
image_name = "my-app"
|
|
region = "eu-west-1"
|
|
stage = "prd"
|
|
url = "https://api.example.com/health"
|
|
}
|
|
```
|
|
|
|
## Inputs
|
|
|
|
| Name | Description | Type | Required |
|
|
|------|-------------|------|----------|
|
|
| `app_namespace` | Namespace that the application runs in | `string` | yes |
|
|
| `cfa_name` | Name of the CFA | `string` | yes |
|
|
| `app_name` | Name of the application | `string` | yes |
|
|
| `team_name` | Name of the responsible team | `string` | yes |
|
|
| `image_name` | Name of the Docker Image | `string` | yes |
|
|
| `region` | AWS region where resources are located | `string` | yes |
|
|
| `stage` | Stage to monitor (dev, tst, stg, prd) | `string` | yes |
|
|
| `url` | URL for Datadog Synthetics to monitor | `string` | yes |
|
|
|
|
## Outputs
|
|
|
|
Currently, this module does not export any outputs.
|
|
|
|
## Monitor Configuration
|
|
|
|
### Pod Health Monitor
|
|
|
|
- **Query**: `avg(last_5m):sum:docker.containers.running{image_name:{image_name}} by {docker_image}.rollup(avg, 60) <= 1`
|
|
- **Type**: Metric alert
|
|
- **Thresholds**:
|
|
- OK: 3 containers
|
|
- Warning: 2 containers
|
|
- Critical: 1 container
|
|
- **No Data**: Alert after 10 minutes
|
|
|
|
### Synthetic API Test
|
|
|
|
- **Type**: HTTP API test
|
|
- **Method**: GET
|
|
- **Interval**: Every 15 minutes (900 seconds)
|
|
- **Assertion**: HTTP status code is 200
|
|
- **Locations**: AWS EU and US regions
|
|
|
|
## Tagging Strategy
|
|
|
|
All resources are tagged with:
|
|
- `cfa:{cfa_name}`
|
|
- `team:{team_name}`
|
|
- `app:{app_name}`
|
|
- `env:{stage}`
|
|
- `type:kubernetes` or `type:synthetics`
|
|
|
|
## Notes
|
|
|
|
- The dashboard is set to read-only mode to prevent accidental modifications
|
|
- Metrics are filtered by the `image_name` variable
|
|
- Synthetic tests run from multiple AWS regions for geographic coverage
|
|
- Pod health monitor uses a 1-minute rollup average
|
|
|
|
## License
|
|
|
|
Internal use only - Sanoma/WeBuildYourCloud
|
|
|
|
## Authors
|
|
|
|
Created and maintained by the Platform Engineering team.
|