8.5 KiB
Executable File
Terraform Datadog Old Monitors Module
Overview
This is a comprehensive, enterprise-ready monitoring module repository (based on Claranet's Datadog monitors repository) with pre-configured monitors for various infrastructure components including middleware, databases, cloud services, and container platforms.
Features
- Enterprise Monitoring Templates: Production-ready monitor configurations
- Multi-Platform Support: AWS, Azure, GCP cloud providers
- Component Coverage: Middleware, databases, containers, networking
- Flexible Configuration: Extensive customization options
- Best Practices: Based on industry standards and real-world deployments
Structure
This module contains multiple sub-modules organized by component type:
terraform-datadog-old-monitors/
├── middleware/ # Nginx, Kong, Apache, PHP-FPM
├── database/ # PostgreSQL, MySQL, Redis, MongoDB, etc.
├── system/ # Generic system and unreachable monitors
├── network/ # HTTP, DNS, TLS monitoring
├── cloud/ # AWS, Azure, GCP specific monitors
│ ├── aws/ # ECS, RDS, Lambda, ALB, etc.
│ ├── azure/ # App Services, Functions, SQL, etc.
│ └── gcp/ # Compute, Cloud SQL, Pub/Sub, etc.
├── caas/ # Docker, Kubernetes monitoring
└── common/ # Shared alerting and filtering modules
Requirements
| Name | Version |
|---|---|
| terraform | >= 0.12 |
| datadog | >= 2.0 |
Usage
Basic Monitor Configuration
module "nginx_monitor" {
source = "./terraform-datadog-old-monitors/middleware/nginx"
environment = "production"
message = "Nginx issue detected @slack-channel"
evaluation_delay = 15
new_host_delay = 300
# Enable/disable specific monitors
nginx_connect_enabled = "true"
nginx_dropped_enabled = "true"
# Customize thresholds
nginx_dropped_connections_critical = 5
nginx_dropped_connections_warning = 3
}
AWS RDS Monitoring
module "rds_monitor" {
source = "./terraform-datadog-old-monitors/cloud/aws/rds/common"
environment = "production"
message = "RDS alert @pagerduty"
# CPU monitoring
cpu_enabled = "true"
cpu_critical = 90
cpu_warning = 75
# Disk monitoring
disk_space_enabled = "true"
disk_space_critical = 90
disk_space_warning = 80
}
Kubernetes Monitoring
module "k8s_pod_monitor" {
source = "./terraform-datadog-old-monitors/caas/kubernetes/pod"
environment = "production"
message = "Kubernetes pod issue @slack-ops"
pod_crash_enabled = "true"
pod_not_running_enabled = "true"
container_restart_enabled = "true"
}
Common Variables
Most sub-modules share these common variables:
| Name | Description | Type | Default |
|---|---|---|---|
environment |
Architecture environment | string |
Required |
message |
Alert message with notification channels | string |
Required |
evaluation_delay |
Metric evaluation delay (seconds) | number |
15 |
new_host_delay |
Delay before monitoring new resources | number |
300 |
prefix_slug |
Prefix for monitor names | string |
"" |
notify_no_data |
Alert on no data | bool |
true |
filter_tags_use_defaults |
Use default filter convention | bool |
true |
filter_tags_custom |
Custom filter tags | string |
"" |
Available Monitor Types
Middleware Monitors
- Nginx: Connection, dropped connections, workers
- Apache: Server status, connections
- Kong: API gateway health and performance
- PHP-FPM: Pool status, slow requests
Database Monitors
- PostgreSQL: Connections, replication lag, locks
- MySQL: Connections, slow queries, replication
- Redis: Memory, connections, evictions
- MongoDB: Connections, replication lag, operations
- Elasticsearch: Cluster health, JVM heap
- SQL Server: Connections, locks, performance
Cloud Services
AWS
- RDS (Aurora PostgreSQL, Aurora MySQL, common)
- EC2 / ECS (Fargate, EC2 cluster)
- Lambda
- ALB / ELB / NLB
- ElastiCache (Redis, Memcached)
- SQS
- API Gateway
- Elasticsearch
Azure
- App Services
- Functions
- SQL Database / Elastic Pool
- PostgreSQL
- Storage
- Key Vault
- Event Hub
- Service Bus
GCP
- Compute Engine
- Cloud SQL (MySQL, common)
- Pub/Sub (topics, subscriptions)
- Load Balancer
- Memorystore Redis
Container Platforms
- Docker: Container status, resource usage
- Kubernetes:
- Pod monitors (crash, restart, not running)
- Node monitors (resource usage, status)
- Cluster monitors (API server, scheduler)
- Workload monitors (deployments, statefulsets)
- Velero/Ark backup monitors
Network Monitors
- HTTP: Webcheck, SSL certificate expiry
- DNS: Query response time, availability
- TLS: Certificate expiration
Monitor Configuration Pattern
Each monitor module follows this pattern:
module "service_monitor" {
source = "./path/to/monitor"
# Environment and messaging
environment = var.environment
message = var.alert_message
# Timing configuration
evaluation_delay = 15
new_host_delay = 300
# Enable/disable monitors
monitor_name_enabled = "true"
# Thresholds
monitor_name_critical = 90
monitor_name_warning = 75
# Filtering
filter_tags_custom = "env:production,team:platform"
}
Alerting Integration
The common/alerting-message module provides templates for:
- PagerDuty integration
- Slack notifications
- Email alerts
- Webhook notifications
Example:
module "alerting" {
source = "./terraform-datadog-old-monitors/common/alerting-message"
message_alert = "@pagerduty-critical"
message_warning = "@slack-warnings"
message_nodata = "@slack-monitoring"
}
Filter Tags
The common/filter-tags module helps with tag-based filtering:
module "filter_tags" {
source = "./terraform-datadog-old-monitors/common/filter-tags"
environment = "production"
filter_tags_use_defaults = true
filter_tags_custom = "service:api,tier:backend"
}
Best Practices
- Start with defaults: Use default thresholds first, then customize
- Gradual rollout: Enable monitors incrementally
- Tag strategy: Use consistent tagging across infrastructure
- Alert fatigue: Tune thresholds to reduce false positives
- Documentation: Document custom threshold decisions
- Testing: Test monitors in non-production first
Customization Examples
Custom Thresholds
# More aggressive CPU monitoring
cpu_critical = 85
cpu_warning = 70
# Relaxed disk space monitoring
disk_space_critical = 95
disk_space_warning = 90
Conditional Monitoring
# Only monitor specific services
filter_tags_custom = "service:critical-app"
# Skip new hosts for longer period
new_host_delay = 600 # 10 minutes
Custom Alert Messages
message = <<-EOT
{{#is_alert}}
CRITICAL: {{check}} on {{host.name}}
@pagerduty-critical
{{/is_alert}}
{{#is_warning}}
WARNING: {{check}} on {{host.name}}
@slack-warnings
{{/is_warning}}
EOT
Maintenance
This module appears to be a legacy/archived version (hence "old-monitors" name). Consider:
- Reviewing for updates from Claranet repository
- Migrating to newer monitoring solutions if available
- Documenting which monitors are actively used
- Deprecating unused monitor configurations
Outputs
Each sub-module may export:
- Monitor IDs
- Monitor names
- Alert status
Check individual module outputs.tf files for specifics.
Notes
- This is a comprehensive library of monitor templates
- Based on Claranet's open-source Datadog monitors
- Covers most common infrastructure components
- Highly customizable with sensible defaults
- May contain more monitors than needed for your use case
- Review and enable only required monitors to avoid alert fatigue
Migration Path
If migrating from this module:
- Audit currently active monitors
- Document custom thresholds
- Test new monitoring solutions in parallel
- Gradually migrate monitor by monitor
- Keep this module for reference
Resources
- Original Claranet repository: terraform-datadog-monitors
- Datadog monitor documentation: Datadog Monitors
License
Based on Claranet's open-source work. Internal use: Sanoma/WeBuildYourCloud
Authors
- Original: Claranet team
- Maintained by: Platform Engineering team