212 lines
6.2 KiB
Markdown
212 lines
6.2 KiB
Markdown
# Terraform AWS Datadog2 Integration & Monitoring Module
|
|
|
|
## Overview
|
|
|
|
The `terraform-aws-datadog2` module is a comprehensive Terraform configuration that integrates AWS with Datadog for monitoring and alerting. It sets up AWS-Datadog integration and creates pre-configured Datadog monitors to track critical infrastructure metrics.
|
|
|
|
## Features
|
|
|
|
- Automated AWS-Datadog integration setup
|
|
- Pre-configured infrastructure monitors for:
|
|
- CPU utilization
|
|
- Memory utilization
|
|
- System load
|
|
- Disk space
|
|
- Disk inodes
|
|
- Disk usage forecasting (7-day prediction)
|
|
- CloudPosse label/tagging context for consistent naming
|
|
- Support for both EU and US Datadog endpoints
|
|
|
|
## Resources Created
|
|
|
|
### AWS Resources (via CloudPosse Module)
|
|
- **IAM Role** - Allows Datadog to assume this role for monitoring AWS resources
|
|
- **External ID** - Security mechanism for cross-account role assumption
|
|
- Associated IAM policies for AWS monitoring permissions
|
|
|
|
### Datadog Monitors
|
|
|
|
1. **CPU Utilization Monitor**
|
|
- Type: Metric alert
|
|
- Warning: 50%
|
|
- Critical: 60%
|
|
|
|
2. **Memory Utilization Monitor**
|
|
- Type: Query alert
|
|
- Evaluation: 5 minutes
|
|
- Warning: 10% usable memory remaining
|
|
- Critical: 5% usable memory remaining
|
|
|
|
3. **System Load Monitor**
|
|
- Type: Query alert
|
|
- Tracks: 5-minute normalized system load
|
|
- Evaluation: 30 minutes
|
|
- Warning: 2.0
|
|
- Critical: 2.5
|
|
|
|
4. **Disk Space Monitor**
|
|
- Type: Query alert
|
|
- Evaluation: 5 minutes
|
|
- Warning: 80% used
|
|
- Critical: 90% used
|
|
|
|
5. **Disk Inodes Monitor**
|
|
- Type: Query alert
|
|
- Evaluation: 5 minutes
|
|
- Warning: 90% used
|
|
- Critical: 95% used
|
|
|
|
6. **Disk Usage Forecast Monitor**
|
|
- Type: Query alert with forecasting
|
|
- Prediction: Next 7 days
|
|
- Forecast model: Linear
|
|
- Warning: 72% predicted usage
|
|
- Critical: 80% predicted usage
|
|
|
|
## Usage
|
|
|
|
```hcl
|
|
module "datadog_monitoring" {
|
|
source = "path/to/terraform-aws-datadog2"
|
|
|
|
# Required variables
|
|
region = "eu-west-1"
|
|
api_key = var.datadog_api_key # Store securely!
|
|
app_key = var.datadog_app_key # Store securely!
|
|
aws_profile = "your-aws-profile"
|
|
prefix_slug = "mycompany"
|
|
team = "platform"
|
|
|
|
# Optional variables
|
|
datadog_site = "https://api.datadoghq.eu/" # Default
|
|
|
|
# CloudPosse label context (optional)
|
|
namespace = "myorg"
|
|
environment = "prod"
|
|
stage = "production"
|
|
name = "monitoring"
|
|
|
|
tags = {
|
|
Project = "Infrastructure"
|
|
ManagedBy = "Terraform"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Variables
|
|
|
|
### Required Variables
|
|
|
|
| Variable | Type | Description |
|
|
|----------|------|-------------|
|
|
| `region` | string | AWS region where monitored resources reside |
|
|
| `api_key` | string | Datadog API key for sending logs, metrics, and traces |
|
|
| `app_key` | string | Datadog application key for API manipulation |
|
|
| `aws_profile` | string | AWS profile name for authentication |
|
|
| `prefix_slug` | string | Prefix slug for naming |
|
|
| `team` | string | Team identifier |
|
|
|
|
### Optional Variables
|
|
|
|
| Variable | Type | Default | Description |
|
|
|----------|------|---------|-------------|
|
|
| `datadog_site` | string | `https://api.datadoghq.eu/` | Datadog site endpoint |
|
|
|
|
### CloudPosse Label Context Variables
|
|
|
|
| Variable | Type | Default | Description |
|
|
|----------|------|---------|-------------|
|
|
| `enabled` | bool | null | Enable/disable resource creation |
|
|
| `namespace` | string | null | Organization name or abbreviation |
|
|
| `environment` | string | null | Environment identifier |
|
|
| `stage` | string | null | Stage identifier |
|
|
| `name` | string | null | Solution name |
|
|
| `delimiter` | string | null | Delimiter between name components |
|
|
| `attributes` | list(string) | [] | Additional attributes for naming |
|
|
| `tags` | map(string) | {} | Additional tags |
|
|
| `label_order` | list(string) | null | Custom ordering of name components |
|
|
|
|
## Outputs
|
|
|
|
| Output | Description |
|
|
|--------|-------------|
|
|
| `aws_account_id` | AWS Account ID of the IAM Role for Datadog |
|
|
| `aws_role_name` | Name of the AWS IAM Role for Datadog |
|
|
| `datadog_external_id` | External ID for secure role assumption |
|
|
|
|
**Note:** These outputs are essential for completing the Datadog integration by providing values to enter in Datadog's AWS integration settings.
|
|
|
|
## Dependencies
|
|
|
|
### Terraform Requirements
|
|
- Terraform >= 0.13.0
|
|
|
|
### Provider Requirements
|
|
- `hashicorp/aws` - AWS infrastructure management
|
|
- `datadog/datadog` - Datadog monitoring resources
|
|
- `hashicorp/local` >= 1.3 - Local file operations
|
|
|
|
### External Modules
|
|
1. **cloudposse/datadog-integration/aws** (v0.11.0)
|
|
- Creates AWS IAM role and permissions for Datadog
|
|
- Handles cross-account role assumption
|
|
|
|
2. **cloudposse/label/null** (v0.24.1)
|
|
- Provides consistent tagging and naming conventions
|
|
|
|
### Prerequisites
|
|
- Valid AWS account with IAM role creation permissions
|
|
- Active Datadog account with monitor creation access
|
|
- Network connectivity to AWS and Datadog APIs
|
|
- Proper AWS profile configured
|
|
|
|
## Post-Deployment Setup
|
|
|
|
After applying this module, complete the integration in Datadog:
|
|
|
|
1. Navigate to AWS integration settings in Datadog console
|
|
2. Add AWS account using the `aws_account_id` output
|
|
3. Add the `aws_role_name` as the IAM role name
|
|
4. Provide the `datadog_external_id` as the external ID
|
|
5. Complete the AWS integration in Datadog console
|
|
|
|
## Monitor Alert Notifications
|
|
|
|
To receive alerts, configure notification channels in Datadog and update the monitors to include your notification preferences.
|
|
|
|
## Customization
|
|
|
|
### Adjusting Monitor Thresholds
|
|
|
|
To adjust alert thresholds, modify the monitor resources in `monitors.tf`:
|
|
|
|
```hcl
|
|
# Example: Adjust CPU warning to 60% and critical to 80%
|
|
resource "datadog_monitor" "cpumonitor" {
|
|
# ... other settings ...
|
|
thresholds = {
|
|
warning = 60
|
|
critical = 80
|
|
}
|
|
}
|
|
```
|
|
|
|
### Adding Additional Monitors
|
|
|
|
Add new monitor resources to `monitors.tf` following the existing patterns.
|
|
|
|
## Security Considerations
|
|
|
|
- Store API keys and app keys securely (use Terraform Cloud, AWS Secrets Manager, or HashiCorp Vault)
|
|
- Never commit sensitive credentials to version control
|
|
- Use IAM role-based access instead of IAM user credentials where possible
|
|
- Review and adjust monitor thresholds based on your workload requirements
|
|
|
|
## License
|
|
|
|
See project license file.
|
|
|
|
## Authors
|
|
|
|
Maintained by WebBuildYourCloud team.
|