Initial commit with README and module files
This commit is contained in:
parent
4ec2ec4da0
commit
5558b3d1b1
0
.gitignore
vendored
Normal file → Executable file
0
.gitignore
vendored
Normal file → Executable file
0
.terraform.lock.hcl
generated
Normal file → Executable file
0
.terraform.lock.hcl
generated
Normal file → Executable file
114
README.md
Normal file → Executable file
114
README.md
Normal file → Executable file
@ -1,12 +1,112 @@
|
||||
# Terraform Datadog Application Dashboard
|
||||
# Terraform Datadog App Dashboard Module
|
||||
|
||||
This Terraform module configures a dashboard for your Kubernetes application.
|
||||
## Overview
|
||||
|
||||
How-to set it up to work with your application is described below
|
||||
This Terraform module creates a comprehensive Kubernetes/Docker application monitoring dashboard in Datadog with CPU and memory utilization metrics, health monitoring, and synthetic API testing.
|
||||
|
||||
# Setup
|
||||
## Features
|
||||
|
||||
First you need to copy the contents of the example directory to a directory
|
||||
- **Kubernetes Resource Monitoring**: Visualizes pod and node resource utilization
|
||||
- **CPU & Memory Tracking**: Top 10 containers by CPU and memory usage
|
||||
- **Health Monitoring**: Pod health metric alerts
|
||||
- **Synthetic Testing**: HTTP API endpoint monitoring
|
||||
- **Read-Only Dashboard**: Prevents accidental modifications in the UI
|
||||
|
||||
module "your_application_name"
|
||||
source = git://
|
||||
## Resources Created
|
||||
|
||||
- `datadog_dashboard`: Application monitoring dashboard with multiple widgets
|
||||
- `datadog_monitor`: Kubernetes Pod Health metric alert
|
||||
- `datadog_synthetics_test`: API HTTP health check
|
||||
|
||||
## Dashboard Widgets
|
||||
|
||||
1. **Kubernetes Pods Hostmap**: CPU utilization by Docker image
|
||||
2. **CPU Utilization Timeseries**: Top 10 containers ranked by CPU usage
|
||||
3. **Kubernetes Nodes Hostmap**: CPU utilization by host
|
||||
4. **Memory Utilization Timeseries**: Top 10 containers ranked by memory usage
|
||||
5. **Alert Graph**: Visualization from the pod health monitor
|
||||
|
||||
## Requirements
|
||||
|
||||
| Name | Version |
|
||||
|------|---------|
|
||||
| terraform | >= 0.12 |
|
||||
| datadog | >= 3.2.0 |
|
||||
|
||||
## Usage
|
||||
|
||||
```hcl
|
||||
module "app_dashboard" {
|
||||
source = "./terraform-datadog-app-dashboard"
|
||||
|
||||
app_namespace = "production"
|
||||
cfa_name = "my-cfa"
|
||||
app_name = "my-application"
|
||||
team_name = "platform-team"
|
||||
image_name = "my-app"
|
||||
region = "eu-west-1"
|
||||
stage = "prd"
|
||||
url = "https://api.example.com/health"
|
||||
}
|
||||
```
|
||||
|
||||
## Inputs
|
||||
|
||||
| Name | Description | Type | Required |
|
||||
|------|-------------|------|----------|
|
||||
| `app_namespace` | Namespace that the application runs in | `string` | yes |
|
||||
| `cfa_name` | Name of the CFA | `string` | yes |
|
||||
| `app_name` | Name of the application | `string` | yes |
|
||||
| `team_name` | Name of the responsible team | `string` | yes |
|
||||
| `image_name` | Name of the Docker Image | `string` | yes |
|
||||
| `region` | AWS region where resources are located | `string` | yes |
|
||||
| `stage` | Stage to monitor (dev, tst, stg, prd) | `string` | yes |
|
||||
| `url` | URL for Datadog Synthetics to monitor | `string` | yes |
|
||||
|
||||
## Outputs
|
||||
|
||||
Currently, this module does not export any outputs.
|
||||
|
||||
## Monitor Configuration
|
||||
|
||||
### Pod Health Monitor
|
||||
|
||||
- **Query**: `avg(last_5m):sum:docker.containers.running{image_name:{image_name}} by {docker_image}.rollup(avg, 60) <= 1`
|
||||
- **Type**: Metric alert
|
||||
- **Thresholds**:
|
||||
- OK: 3 containers
|
||||
- Warning: 2 containers
|
||||
- Critical: 1 container
|
||||
- **No Data**: Alert after 10 minutes
|
||||
|
||||
### Synthetic API Test
|
||||
|
||||
- **Type**: HTTP API test
|
||||
- **Method**: GET
|
||||
- **Interval**: Every 15 minutes (900 seconds)
|
||||
- **Assertion**: HTTP status code is 200
|
||||
- **Locations**: AWS EU and US regions
|
||||
|
||||
## Tagging Strategy
|
||||
|
||||
All resources are tagged with:
|
||||
- `cfa:{cfa_name}`
|
||||
- `team:{team_name}`
|
||||
- `app:{app_name}`
|
||||
- `env:{stage}`
|
||||
- `type:kubernetes` or `type:synthetics`
|
||||
|
||||
## Notes
|
||||
|
||||
- The dashboard is set to read-only mode to prevent accidental modifications
|
||||
- Metrics are filtered by the `image_name` variable
|
||||
- Synthetic tests run from multiple AWS regions for geographic coverage
|
||||
- Pod health monitor uses a 1-minute rollup average
|
||||
|
||||
## License
|
||||
|
||||
Internal use only - Sanoma/WeBuildYourCloud
|
||||
|
||||
## Authors
|
||||
|
||||
Created and maintained by the Platform Engineering team.
|
||||
|
||||
0
example/main.tf
Normal file → Executable file
0
example/main.tf
Normal file → Executable file
0
monitors.tf
Normal file → Executable file
0
monitors.tf
Normal file → Executable file
0
outputs.tf
Normal file → Executable file
0
outputs.tf
Normal file → Executable file
0
provider.tf
Normal file → Executable file
0
provider.tf
Normal file → Executable file
0
synthetics.tf
Normal file → Executable file
0
synthetics.tf
Normal file → Executable file
0
terraform.tfvars
Normal file → Executable file
0
terraform.tfvars
Normal file → Executable file
0
variables.tf
Normal file → Executable file
0
variables.tf
Normal file → Executable file
Loading…
x
Reference in New Issue
Block a user