# Terraform Datadog Old Monitors Module ## Overview This is a comprehensive, enterprise-ready monitoring module repository (based on Claranet's Datadog monitors repository) with pre-configured monitors for various infrastructure components including middleware, databases, cloud services, and container platforms. ## Features - **Enterprise Monitoring Templates**: Production-ready monitor configurations - **Multi-Platform Support**: AWS, Azure, GCP cloud providers - **Component Coverage**: Middleware, databases, containers, networking - **Flexible Configuration**: Extensive customization options - **Best Practices**: Based on industry standards and real-world deployments ## Structure This module contains multiple sub-modules organized by component type: ``` terraform-datadog-old-monitors/ ├── middleware/ # Nginx, Kong, Apache, PHP-FPM ├── database/ # PostgreSQL, MySQL, Redis, MongoDB, etc. ├── system/ # Generic system and unreachable monitors ├── network/ # HTTP, DNS, TLS monitoring ├── cloud/ # AWS, Azure, GCP specific monitors │ ├── aws/ # ECS, RDS, Lambda, ALB, etc. │ ├── azure/ # App Services, Functions, SQL, etc. │ └── gcp/ # Compute, Cloud SQL, Pub/Sub, etc. ├── caas/ # Docker, Kubernetes monitoring └── common/ # Shared alerting and filtering modules ``` ## Requirements | Name | Version | |------|---------| | terraform | >= 0.12 | | datadog | >= 2.0 | ## Usage ### Basic Monitor Configuration ```hcl module "nginx_monitor" { source = "./terraform-datadog-old-monitors/middleware/nginx" environment = "production" message = "Nginx issue detected @slack-channel" evaluation_delay = 15 new_host_delay = 300 # Enable/disable specific monitors nginx_connect_enabled = "true" nginx_dropped_enabled = "true" # Customize thresholds nginx_dropped_connections_critical = 5 nginx_dropped_connections_warning = 3 } ``` ### AWS RDS Monitoring ```hcl module "rds_monitor" { source = "./terraform-datadog-old-monitors/cloud/aws/rds/common" environment = "production" message = "RDS alert @pagerduty" # CPU monitoring cpu_enabled = "true" cpu_critical = 90 cpu_warning = 75 # Disk monitoring disk_space_enabled = "true" disk_space_critical = 90 disk_space_warning = 80 } ``` ### Kubernetes Monitoring ```hcl module "k8s_pod_monitor" { source = "./terraform-datadog-old-monitors/caas/kubernetes/pod" environment = "production" message = "Kubernetes pod issue @slack-ops" pod_crash_enabled = "true" pod_not_running_enabled = "true" container_restart_enabled = "true" } ``` ## Common Variables Most sub-modules share these common variables: | Name | Description | Type | Default | |------|-------------|------|---------| | `environment` | Architecture environment | `string` | Required | | `message` | Alert message with notification channels | `string` | Required | | `evaluation_delay` | Metric evaluation delay (seconds) | `number` | `15` | | `new_host_delay` | Delay before monitoring new resources | `number` | `300` | | `prefix_slug` | Prefix for monitor names | `string` | `""` | | `notify_no_data` | Alert on no data | `bool` | `true` | | `filter_tags_use_defaults` | Use default filter convention | `bool` | `true` | | `filter_tags_custom` | Custom filter tags | `string` | `""` | ## Available Monitor Types ### Middleware Monitors - **Nginx**: Connection, dropped connections, workers - **Apache**: Server status, connections - **Kong**: API gateway health and performance - **PHP-FPM**: Pool status, slow requests ### Database Monitors - **PostgreSQL**: Connections, replication lag, locks - **MySQL**: Connections, slow queries, replication - **Redis**: Memory, connections, evictions - **MongoDB**: Connections, replication lag, operations - **Elasticsearch**: Cluster health, JVM heap - **SQL Server**: Connections, locks, performance ### Cloud Services #### AWS - RDS (Aurora PostgreSQL, Aurora MySQL, common) - EC2 / ECS (Fargate, EC2 cluster) - Lambda - ALB / ELB / NLB - ElastiCache (Redis, Memcached) - SQS - API Gateway - Elasticsearch #### Azure - App Services - Functions - SQL Database / Elastic Pool - PostgreSQL - Storage - Key Vault - Event Hub - Service Bus #### GCP - Compute Engine - Cloud SQL (MySQL, common) - Pub/Sub (topics, subscriptions) - Load Balancer - Memorystore Redis ### Container Platforms - **Docker**: Container status, resource usage - **Kubernetes**: - Pod monitors (crash, restart, not running) - Node monitors (resource usage, status) - Cluster monitors (API server, scheduler) - Workload monitors (deployments, statefulsets) - Velero/Ark backup monitors ### Network Monitors - **HTTP**: Webcheck, SSL certificate expiry - **DNS**: Query response time, availability - **TLS**: Certificate expiration ## Monitor Configuration Pattern Each monitor module follows this pattern: ```hcl module "service_monitor" { source = "./path/to/monitor" # Environment and messaging environment = var.environment message = var.alert_message # Timing configuration evaluation_delay = 15 new_host_delay = 300 # Enable/disable monitors monitor_name_enabled = "true" # Thresholds monitor_name_critical = 90 monitor_name_warning = 75 # Filtering filter_tags_custom = "env:production,team:platform" } ``` ## Alerting Integration The `common/alerting-message` module provides templates for: - PagerDuty integration - Slack notifications - Email alerts - Webhook notifications Example: ```hcl module "alerting" { source = "./terraform-datadog-old-monitors/common/alerting-message" message_alert = "@pagerduty-critical" message_warning = "@slack-warnings" message_nodata = "@slack-monitoring" } ``` ## Filter Tags The `common/filter-tags` module helps with tag-based filtering: ```hcl module "filter_tags" { source = "./terraform-datadog-old-monitors/common/filter-tags" environment = "production" filter_tags_use_defaults = true filter_tags_custom = "service:api,tier:backend" } ``` ## Best Practices 1. **Start with defaults**: Use default thresholds first, then customize 2. **Gradual rollout**: Enable monitors incrementally 3. **Tag strategy**: Use consistent tagging across infrastructure 4. **Alert fatigue**: Tune thresholds to reduce false positives 5. **Documentation**: Document custom threshold decisions 6. **Testing**: Test monitors in non-production first ## Customization Examples ### Custom Thresholds ```hcl # More aggressive CPU monitoring cpu_critical = 85 cpu_warning = 70 # Relaxed disk space monitoring disk_space_critical = 95 disk_space_warning = 90 ``` ### Conditional Monitoring ```hcl # Only monitor specific services filter_tags_custom = "service:critical-app" # Skip new hosts for longer period new_host_delay = 600 # 10 minutes ``` ### Custom Alert Messages ```hcl message = <<-EOT {{#is_alert}} CRITICAL: {{check}} on {{host.name}} @pagerduty-critical {{/is_alert}} {{#is_warning}} WARNING: {{check}} on {{host.name}} @slack-warnings {{/is_warning}} EOT ``` ## Maintenance This module appears to be a legacy/archived version (hence "old-monitors" name). Consider: - Reviewing for updates from Claranet repository - Migrating to newer monitoring solutions if available - Documenting which monitors are actively used - Deprecating unused monitor configurations ## Outputs Each sub-module may export: - Monitor IDs - Monitor names - Alert status Check individual module outputs.tf files for specifics. ## Notes - This is a comprehensive library of monitor templates - Based on Claranet's open-source Datadog monitors - Covers most common infrastructure components - Highly customizable with sensible defaults - May contain more monitors than needed for your use case - Review and enable only required monitors to avoid alert fatigue ## Migration Path If migrating from this module: 1. Audit currently active monitors 2. Document custom thresholds 3. Test new monitoring solutions in parallel 4. Gradually migrate monitor by monitor 5. Keep this module for reference ## Resources - Original Claranet repository: [terraform-datadog-monitors](https://github.com/claranet/terraform-datadog-monitors) - Datadog monitor documentation: [Datadog Monitors](https://docs.datadoghq.com/monitors/) ## License Based on Claranet's open-source work. Internal use: Sanoma/WeBuildYourCloud ## Authors - Original: Claranet team - Maintained by: Platform Engineering team