terraform-datadog-old-monitors/README.md

# Terraform Datadog Old Monitors Module

## Overview

This is a comprehensive, enterprise-ready monitoring module repository (based on Claranet's Datadog monitors repository) with pre-configured monitors for various infrastructure components including middleware, databases, cloud services, and container platforms.

## Features

- **Enterprise Monitoring Templates**: Production-ready monitor configurations
- **Multi-Platform Support**: AWS, Azure, GCP cloud providers
- **Component Coverage**: Middleware, databases, containers, networking
- **Flexible Configuration**: Extensive customization options
- **Best Practices**: Based on industry standards and real-world deployments

## Structure

This module contains multiple sub-modules organized by component type:

```
terraform-datadog-old-monitors/
├── middleware/          # Nginx, Kong, Apache, PHP-FPM
├── database/           # PostgreSQL, MySQL, Redis, MongoDB, etc.
├── system/             # Generic system and unreachable monitors
├── network/            # HTTP, DNS, TLS monitoring
├── cloud/              # AWS, Azure, GCP specific monitors
│   ├── aws/           # ECS, RDS, Lambda, ALB, etc.
│   ├── azure/         # App Services, Functions, SQL, etc.
│   └── gcp/           # Compute, Cloud SQL, Pub/Sub, etc.
├── caas/              # Docker, Kubernetes monitoring
└── common/            # Shared alerting and filtering modules
```

## Requirements

| Name | Version |
|------|---------|
| terraform | >= 0.12 |
| datadog | >= 2.0 |

## Usage

### Basic Monitor Configuration

```hcl
module "nginx_monitor" {
  source = "./terraform-datadog-old-monitors/middleware/nginx"

  environment         = "production"
  message            = "Nginx issue detected @slack-channel"
  evaluation_delay   = 15
  new_host_delay    = 300

  # Enable/disable specific monitors
  nginx_connect_enabled = "true"
  nginx_dropped_enabled = "true"

  # Customize thresholds
  nginx_dropped_connections_critical = 5
  nginx_dropped_connections_warning  = 3
}
```

### AWS RDS Monitoring

```hcl
module "rds_monitor" {
  source = "./terraform-datadog-old-monitors/cloud/aws/rds/common"

  environment = "production"
  message     = "RDS alert @pagerduty"

  # CPU monitoring
  cpu_enabled          = "true"
  cpu_critical         = 90
  cpu_warning          = 75

  # Disk monitoring
  disk_space_enabled   = "true"
  disk_space_critical  = 90
  disk_space_warning   = 80
}
```

### Kubernetes Monitoring

```hcl
module "k8s_pod_monitor" {
  source = "./terraform-datadog-old-monitors/caas/kubernetes/pod"

  environment = "production"
  message     = "Kubernetes pod issue @slack-ops"

  pod_crash_enabled        = "true"
  pod_not_running_enabled  = "true"
  container_restart_enabled = "true"
}
```

## Common Variables

Most sub-modules share these common variables:

| Name | Description | Type | Default |
|------|-------------|------|---------|
| `environment` | Architecture environment | `string` | Required |
| `message` | Alert message with notification channels | `string` | Required |
| `evaluation_delay` | Metric evaluation delay (seconds) | `number` | `15` |
| `new_host_delay` | Delay before monitoring new resources | `number` | `300` |
| `prefix_slug` | Prefix for monitor names | `string` | `""` |
| `notify_no_data` | Alert on no data | `bool` | `true` |
| `filter_tags_use_defaults` | Use default filter convention | `bool` | `true` |
| `filter_tags_custom` | Custom filter tags | `string` | `""` |

## Available Monitor Types

### Middleware Monitors

- **Nginx**: Connection, dropped connections, workers
- **Apache**: Server status, connections
- **Kong**: API gateway health and performance
- **PHP-FPM**: Pool status, slow requests

### Database Monitors

- **PostgreSQL**: Connections, replication lag, locks
- **MySQL**: Connections, slow queries, replication
- **Redis**: Memory, connections, evictions
- **MongoDB**: Connections, replication lag, operations
- **Elasticsearch**: Cluster health, JVM heap
- **SQL Server**: Connections, locks, performance

### Cloud Services

#### AWS
- RDS (Aurora PostgreSQL, Aurora MySQL, common)
- EC2 / ECS (Fargate, EC2 cluster)
- Lambda
- ALB / ELB / NLB
- ElastiCache (Redis, Memcached)
- SQS
- API Gateway
- Elasticsearch

#### Azure
- App Services
- Functions
- SQL Database / Elastic Pool
- PostgreSQL
- Storage
- Key Vault
- Event Hub
- Service Bus

#### GCP
- Compute Engine
- Cloud SQL (MySQL, common)
- Pub/Sub (topics, subscriptions)
- Load Balancer
- Memorystore Redis

### Container Platforms

- **Docker**: Container status, resource usage
- **Kubernetes**:
  - Pod monitors (crash, restart, not running)
  - Node monitors (resource usage, status)
  - Cluster monitors (API server, scheduler)
  - Workload monitors (deployments, statefulsets)
  - Velero/Ark backup monitors

### Network Monitors

- **HTTP**: Webcheck, SSL certificate expiry
- **DNS**: Query response time, availability
- **TLS**: Certificate expiration

## Monitor Configuration Pattern

Each monitor module follows this pattern:

```hcl
module "service_monitor" {
  source = "./path/to/monitor"

  # Environment and messaging
  environment = var.environment
  message     = var.alert_message

  # Timing configuration
  evaluation_delay = 15
  new_host_delay   = 300

  # Enable/disable monitors
  monitor_name_enabled = "true"

  # Thresholds
  monitor_name_critical = 90
  monitor_name_warning  = 75

  # Filtering
  filter_tags_custom = "env:production,team:platform"
}
```

## Alerting Integration

The `common/alerting-message` module provides templates for:
- PagerDuty integration
- Slack notifications
- Email alerts
- Webhook notifications

Example:
```hcl
module "alerting" {
  source = "./terraform-datadog-old-monitors/common/alerting-message"

  message_alert   = "@pagerduty-critical"
  message_warning = "@slack-warnings"
  message_nodata  = "@slack-monitoring"
}
```

## Filter Tags

The `common/filter-tags` module helps with tag-based filtering:

```hcl
module "filter_tags" {
  source = "./terraform-datadog-old-monitors/common/filter-tags"

  environment           = "production"
  filter_tags_use_defaults = true
  filter_tags_custom    = "service:api,tier:backend"
}
```

## Best Practices

1. **Start with defaults**: Use default thresholds first, then customize
2. **Gradual rollout**: Enable monitors incrementally
3. **Tag strategy**: Use consistent tagging across infrastructure
4. **Alert fatigue**: Tune thresholds to reduce false positives
5. **Documentation**: Document custom threshold decisions
6. **Testing**: Test monitors in non-production first

## Customization Examples

### Custom Thresholds

```hcl
# More aggressive CPU monitoring
cpu_critical = 85
cpu_warning  = 70

# Relaxed disk space monitoring
disk_space_critical = 95
disk_space_warning  = 90
```

### Conditional Monitoring

```hcl
# Only monitor specific services
filter_tags_custom = "service:critical-app"

# Skip new hosts for longer period
new_host_delay = 600  # 10 minutes
```

### Custom Alert Messages

```hcl
message = <<-EOT
  {{#is_alert}}
  CRITICAL: {{check}} on {{host.name}}
  @pagerduty-critical
  {{/is_alert}}

  {{#is_warning}}
  WARNING: {{check}} on {{host.name}}
  @slack-warnings
  {{/is_warning}}
EOT
```

## Maintenance

This module appears to be a legacy/archived version (hence "old-monitors" name). Consider:
- Reviewing for updates from Claranet repository
- Migrating to newer monitoring solutions if available
- Documenting which monitors are actively used
- Deprecating unused monitor configurations

## Outputs

Each sub-module may export:
- Monitor IDs
- Monitor names
- Alert status

Check individual module outputs.tf files for specifics.

## Notes

- This is a comprehensive library of monitor templates
- Based on Claranet's open-source Datadog monitors
- Covers most common infrastructure components
- Highly customizable with sensible defaults
- May contain more monitors than needed for your use case
- Review and enable only required monitors to avoid alert fatigue

## Migration Path

If migrating from this module:
1. Audit currently active monitors
2. Document custom thresholds
3. Test new monitoring solutions in parallel
4. Gradually migrate monitor by monitor
5. Keep this module for reference

## Resources

- Original Claranet repository: [terraform-datadog-monitors](https://github.com/claranet/terraform-datadog-monitors)
- Datadog monitor documentation: [Datadog Monitors](https://docs.datadoghq.com/monitors/)

## License

Based on Claranet's open-source work.
Internal use: Sanoma/WeBuildYourCloud

## Authors

- Original: Claranet team
- Maintained by: Platform Engineering team