MON-459 update all readme with new hack

This commit is contained in:
Quentin Manfroi 2019-06-27 16:25:36 +02:00
parent c574e07f87
commit 53714ad8fc
108 changed files with 533 additions and 620 deletions

View File

@ -23,7 +23,7 @@ Creates DataDog monitors with the following checks:
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| ark\_schedules\_enabled | Flag to enable Ark schedules monitor | string | `"true"` | no |
| ark\_schedules\_extra\_tags | Extra tags for Ark schedules monitor | list | `[]` | no |
| ark\_schedules\_extra\_tags | Extra tags for Ark schedules monitor | list(string) | `[]` | no |
| ark\_schedules\_monitor\_message | Custom message for Ark schedules monitor | string | `""` | no |
| ark\_schedules\_monitor\_no\_data\_timeframe | No data timeframe in minutes | string | `"1440"` | no |
| ark\_schedules\_monitor\_timeframe | Monitor timeframe for Ark schedules monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_1d"` | no |

View File

@ -23,7 +23,7 @@ Creates DataDog monitors with the following checks:
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| apiserver\_enabled | Flag to enable API server monitor | string | `"true"` | no |
| apiserver\_extra\_tags | Extra tags for API server monitor | list | `[]` | no |
| apiserver\_extra\_tags | Extra tags for API server monitor | list(string) | `[]` | no |
| apiserver\_message | Custom message for API server monitor | string | `""` | no |
| apiserver\_threshold\_warning | API server monitor (warning threshold) | string | `"3"` | no |
| environment | Architecture environment | string | n/a | yes |

View File

@ -16,8 +16,8 @@ module "datadog-monitors-caas-kubernetes-ingress-vts" {
Creates DataDog monitors with the following checks:
- Nginx Ingress 4xx errors
- Nginx Ingress 5xx errors
- Nginx Ingress 4xx errors
## Inputs
@ -30,14 +30,14 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| ingress\_4xx\_enabled | Flag to enable Ingress 4xx errors monitor | string | `"true"` | no |
| ingress\_4xx\_extra\_tags | Extra tags for Ingress 4xx errors monitor | list | `[]` | no |
| ingress\_4xx\_extra\_tags | Extra tags for Ingress 4xx errors monitor | list(string) | `[]` | no |
| ingress\_4xx\_message | Message sent when an alert is triggered | string | `""` | no |
| ingress\_4xx\_threshold\_critical | 4xx critical threshold in percentage | string | `"40"` | no |
| ingress\_4xx\_threshold\_warning | 4xx warning threshold in percentage | string | `"20"` | no |
| ingress\_4xx\_time\_aggregator | Monitor aggregator for Ingress 4xx errors [available values: min, max or avg] | string | `"min"` | no |
| ingress\_4xx\_timeframe | Monitor timeframe for Ingress 4xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| ingress\_5xx\_enabled | Flag to enable Ingress 5xx errors monitor | string | `"true"` | no |
| ingress\_5xx\_extra\_tags | Extra tags for Ingress 5xx errors monitor | list | `[]` | no |
| ingress\_5xx\_extra\_tags | Extra tags for Ingress 5xx errors monitor | list(string) | `[]` | no |
| ingress\_5xx\_message | Message sent when an alert is triggered | string | `""` | no |
| ingress\_5xx\_threshold\_critical | 5xx critical threshold in percentage | string | `"20"` | no |
| ingress\_5xx\_threshold\_warning | 5xx warning threshold in percentage | string | `"10"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "nginx_ingress_too_many_5xx" {
count = var.ingress_5xx_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Nginx Ingress 5xx errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.ingress_5xx_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.ingress_5xx_time_aggregator}(${var.ingress_5xx_timeframe}): default(

View File

@ -17,26 +17,22 @@ module "datadog-monitors-caas-kubernetes-node" {
Creates DataDog monitors with the following checks:
- Kubernetes Node Disk pressure
- Kubernetes Node Frequent unregister net device
- Kubernetes Node Kubelet API does not respond
- Kubernetes Node Kubelet sync loop that updates containers does not work
- Kubernetes Node Memory pressure
- Kubernetes Node not ready
- Kubernetes Node Out of disk
- Kubernetes Node unschedulable
- Kubernetes Node volume inodes usage
- Kubernetes Node volume space usage
- Kubernetes Node Kubelet sync loop that updates containers does not work
- Kubernetes Node Out of disk
- Kubernetes Node volume inodes usage
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| disk\_out\_enabled | Flag to enable Out of disk monitor | string | `"true"` | no |
| disk\_out\_extra\_tags | Extra tags for Out of disk monitor | list | `[]` | no |
| disk\_out\_extra\_tags | Extra tags for Out of disk monitor | list(string) | `[]` | no |
| disk\_out\_message | Custom message for Out of disk monitor | string | `""` | no |
| disk\_out\_threshold\_warning | Out of disk monitor (warning threshold) | string | `"3"` | no |
| disk\_pressure\_enabled | Flag to enable Disk pressure monitor | string | `"true"` | no |
| disk\_pressure\_extra\_tags | Extra tags for Disk pressure monitor | list | `[]` | no |
| disk\_pressure\_extra\_tags | Extra tags for Disk pressure monitor | list(string) | `[]` | no |
| disk\_pressure\_message | Custom message for Disk pressure monitor | string | `""` | no |
| disk\_pressure\_threshold\_warning | Disk pressure monitor (warning threshold) | string | `"3"` | no |
| environment | Architecture environment | string | n/a | yes |
@ -45,44 +41,44 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| kubelet\_ping\_enabled | Flag to enable Kubelet ping monitor | string | `"true"` | no |
| kubelet\_ping\_extra\_tags | Extra tags for Kubelet ping monitor | list | `[]` | no |
| kubelet\_ping\_extra\_tags | Extra tags for Kubelet ping monitor | list(string) | `[]` | no |
| kubelet\_ping\_message | Custom message for Kubelet ping monitor | string | `""` | no |
| kubelet\_ping\_threshold\_warning | Kubelet ping monitor (warning threshold) | string | `"3"` | no |
| kubelet\_syncloop\_enabled | Flag to enable Kubelet sync loop monitor | string | `"true"` | no |
| kubelet\_syncloop\_extra\_tags | Extra tags for Kubelet sync loop monitor | list | `[]` | no |
| kubelet\_syncloop\_extra\_tags | Extra tags for Kubelet sync loop monitor | list(string) | `[]` | no |
| kubelet\_syncloop\_message | Custom message for Kubelet sync loop monitor | string | `""` | no |
| kubelet\_syncloop\_threshold\_warning | Kubelet sync loop monitor (warning threshold) | string | `"3"` | no |
| memory\_pressure\_enabled | Flag to enable Memory pressure monitor | string | `"true"` | no |
| memory\_pressure\_extra\_tags | Extra tags for Memory pressure monitor | list | `[]` | no |
| memory\_pressure\_extra\_tags | Extra tags for Memory pressure monitor | list(string) | `[]` | no |
| memory\_pressure\_message | Custom message for Memory pressure monitor | string | `""` | no |
| memory\_pressure\_threshold\_warning | Memory pressure monitor (warning threshold) | string | `"3"` | no |
| message | Message sent when a monitor is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| node\_unschedulable\_enabled | Flag to enable node unschedulable monitor | string | `"true"` | no |
| node\_unschedulable\_extra\_tags | Extra tags for node unschedulable monitor | list | `[]` | no |
| node\_unschedulable\_extra\_tags | Extra tags for node unschedulable monitor | list(string) | `[]` | no |
| node\_unschedulable\_message | Custom message for node unschedulable monitor | string | `""` | no |
| node\_unschedulable\_time\_aggregator | Monitor aggregator for node unschedulable [available values: min, max or avg] | string | `"min"` | no |
| node\_unschedulable\_timeframe | Monitor timeframe for node unschedulable [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_1h"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| ready\_enabled | Flag to enable Node ready monitor | string | `"true"` | no |
| ready\_extra\_tags | Extra tags for Node ready monitor | list | `[]` | no |
| ready\_extra\_tags | Extra tags for Node ready monitor | list(string) | `[]` | no |
| ready\_message | Custom message for Node ready monitor | string | `""` | no |
| ready\_threshold\_warning | Node ready monitor (warning threshold) | string | `"3"` | no |
| unregister\_net\_device\_enabled | Flag to enable Unregister net device monitor | string | `"true"` | no |
| unregister\_net\_device\_extra\_tags | Extra tags for Unregister net device monitor | list | `[]` | no |
| unregister\_net\_device\_extra\_tags | Extra tags for Unregister net device monitor | list(string) | `[]` | no |
| unregister\_net\_device\_message | Custom message for Unregister net device monitor | string | `""` | no |
| unregister\_net\_device\_threshold\_critical | Unregister net device critical threshold | string | `"3"` | no |
| unregister\_net\_device\_time\_aggregator | Monitor aggregator for Unregister net device [available values: min, max or avg] | string | `"min"` | no |
| unregister\_net\_device\_timeframe | Monitor timeframe for Unregister net device [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"15m"` | no |
| volume\_inodes\_enabled | Flag to enable Volume inodes monitor | string | `"true"` | no |
| volume\_inodes\_extra\_tags | Extra tags for Volume inodes monitor | list | `[]` | no |
| volume\_inodes\_extra\_tags | Extra tags for Volume inodes monitor | list(string) | `[]` | no |
| volume\_inodes\_message | Custom message for Volume inodes monitor | string | `""` | no |
| volume\_inodes\_threshold\_critical | Volume inodes critical threshold | string | `"95"` | no |
| volume\_inodes\_threshold\_warning | Volume inodes warning threshold | string | `"90"` | no |
| volume\_inodes\_time\_aggregator | Monitor aggregator for Volume inodes [available values: min, max or avg] | string | `"min"` | no |
| volume\_inodes\_timeframe | Monitor timeframe for Volume inodes [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| volume\_space\_enabled | Flag to enable Volume space monitor | string | `"true"` | no |
| volume\_space\_extra\_tags | Extra tags for Volume space monitor | list | `[]` | no |
| volume\_space\_extra\_tags | Extra tags for Volume space monitor | list(string) | `[]` | no |
| volume\_space\_message | Custom message for Volume space monitor | string | `""` | no |
| volume\_space\_threshold\_critical | Volume space critical threshold | string | `"95"` | no |
| volume\_space\_threshold\_warning | Volume space warning threshold | string | `"90"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "disk_pressure" {
count = var.disk_pressure_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Kubernetes Node Disk pressure"
message = coalesce(var.disk_pressure_message, var.message)
type = "service check"
type = "service check"
query = <<EOQ
"kubernetes_state.node.disk_pressure"${module.filter-tags.service_check}.by("kubernetescluster","node").last(6).count_by_status()
@ -56,7 +56,7 @@ resource "datadog_monitor" "memory_pressure" {
count = var.memory_pressure_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Kubernetes Node Memory pressure"
message = coalesce(var.memory_pressure_message, var.message)
type = "service check"
type = "service check"
query = <<EOQ
"kubernetes_state.node.memory_pressure"${module.filter-tags.service_check}.by("kubernetescluster","node").last(6).count_by_status()
@ -110,7 +110,7 @@ resource "datadog_monitor" "kubelet_ping" {
count = var.kubelet_ping_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Kubernetes Node Kubelet API does not respond"
message = coalesce(var.kubelet_ping_message, var.message)
type = "service check"
type = "service check"
query = <<EOQ
"kubernetes.kubelet.check.ping"${module.filter-tags.service_check}.by("kubernetescluster","name").last(6).count_by_status()
@ -197,8 +197,8 @@ EOQ
critical = 0
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
notify_audit = false
@ -207,7 +207,7 @@ EOQ
locked = false
require_full_window = true
tags = concat(["env:${var.environment}", "type:caas", "provider:kubernetes", "resource:kubernetes-node", "team:claranet", "created-by:terraform"], var.node_unschedulable_extra_tags)
tags = concat(["env:${var.environment}", "type:caas", "provider:kubernetes", "resource:kubernetes-node", "team:claranet", "created-by:terraform"], var.node_unschedulable_extra_tags)
}
resource "datadog_monitor" "volume_space" {
@ -245,7 +245,7 @@ resource "datadog_monitor" "volume_inodes" {
count = var.volume_inodes_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Kubernetes Node volume inodes usage {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.volume_inodes_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.volume_inodes_time_aggregator}(${var.volume_inodes_timeframe}):
@ -259,8 +259,8 @@ critical = var.volume_inodes_threshold_critical
warning = var.volume_inodes_threshold_warning
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
notify_audit = false
@ -269,6 +269,6 @@ include_tags = true
locked = false
require_full_window = true
tags = concat(["env:${var.environment}", "type:caas", "provider:kubernetes", "resource:kubernetes-node", "team:claranet", "created-by:terraform"], var.volume_inodes_extra_tags)
tags = concat(["env:${var.environment}", "type:caas", "provider:kubernetes", "resource:kubernetes-node", "team:claranet", "created-by:terraform"], var.volume_inodes_extra_tags)
}

View File

@ -25,7 +25,7 @@ Creates DataDog monitors with the following checks:
|------|-------------|:----:|:-----:|:-----:|
| environment | Architecture environment | string | n/a | yes |
| error\_enabled | Flag to enable Pod errors monitor | string | `"true"` | no |
| error\_extra\_tags | Extra tags for Pod errors monitor | list | `[]` | no |
| error\_extra\_tags | Extra tags for Pod errors monitor | list(string) | `[]` | no |
| error\_message | Custom message for Pod errors monitor | string | `""` | no |
| error\_threshold\_critical | error critical threshold | string | `"0.5"` | no |
| error\_threshold\_warning | error warning threshold | string | `"0"` | no |
@ -38,7 +38,7 @@ Creates DataDog monitors with the following checks:
| message | Message sent when a monitor is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| pod\_phase\_status\_enabled | Flag to enable Pod phase status monitor | string | `"true"` | no |
| pod\_phase\_status\_extra\_tags | Extra tags for Pod phase status monitor | list | `[]` | no |
| pod\_phase\_status\_extra\_tags | Extra tags for Pod phase status monitor | list(string) | `[]` | no |
| pod\_phase\_status\_message | Custom message for Pod phase status monitor | string | `""` | no |
| pod\_phase\_status\_time\_aggregator | Monitor aggregator for Pod phase status [available values: min, max or avg] | string | `"max"` | no |
| pod\_phase\_status\_timeframe | Monitor timeframe for Pod phase status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -31,7 +31,7 @@ resource "datadog_monitor" "error" {
count = var.error_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Kubernetes Pod waiting errors"
message = coalesce(var.error_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.error_time_aggregator}(${var.error_timeframe}):
@ -44,8 +44,8 @@ critical = var.error_threshold_critical
warning = var.error_threshold_warning
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
notify_audit = false
@ -54,6 +54,6 @@ include_tags = true
locked = false
require_full_window = true
tags = concat(["env:${var.environment}", "type:caas", "provider:kubernetes", "resource:kubernetes-pod", "team:claranet", "created-by:terraform"], var.error_extra_tags)
tags = concat(["env:${var.environment}", "type:caas", "provider:kubernetes", "resource:kubernetes-pod", "team:claranet", "created-by:terraform"], var.error_extra_tags)
}

View File

@ -16,18 +16,16 @@ module "datadog-monitors-caas-kubernetes-workload" {
Creates DataDog monitors with the following checks:
- Kubernetes Available replicas
- Kubernetes cronjob scheduling failed
- Kubernetes Current replicas
- Kubernetes job failed
- Kubernetes Ready replicas
- Kubernetes cronjob scheduling failed
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cronjob\_enabled | Flag to enable Cronjob monitor | string | `"true"` | no |
| cronjob\_extra\_tags | Extra tags for Cronjob monitor | list | `[]` | no |
| cronjob\_extra\_tags | Extra tags for Cronjob monitor | list(string) | `[]` | no |
| cronjob\_message | Custom message for Cronjob monitor | string | `""` | no |
| cronjob\_threshold\_warning | Cronjob monitor (warning threshold) | string | `"3"` | no |
| environment | Architecture environment | string | n/a | yes |
@ -36,26 +34,26 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| job\_enabled | Flag to enable Job monitor | string | `"true"` | no |
| job\_extra\_tags | Extra tags for Job monitor | list | `[]` | no |
| job\_extra\_tags | Extra tags for Job monitor | list(string) | `[]` | no |
| job\_message | Custom message for Job monitor | string | `""` | no |
| job\_threshold\_warning | Job monitor (warning threshold) | string | `"3"` | no |
| message | Message sent when a monitor is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| replica\_available\_enabled | Flag to enable Available replica monitor | string | `"true"` | no |
| replica\_available\_extra\_tags | Extra tags for Available replicamonitor | list | `[]` | no |
| replica\_available\_extra\_tags | Extra tags for Available replicamonitor | list(string) | `[]` | no |
| replica\_available\_message | Custom message for Available replica monitor | string | `""` | no |
| replica\_available\_threshold\_critical | Available replica critical threshold | string | `"1"` | no |
| replica\_available\_time\_aggregator | Monitor aggregator for Available replica [available values: min, max or avg] | string | `"max"` | no |
| replica\_available\_timeframe | Monitor timeframe for Available replica [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| replica\_current\_enabled | Flag to enable Current replica monitor | string | `"true"` | no |
| replica\_current\_extra\_tags | Extra tags for Current replica monitor | list | `[]` | no |
| replica\_current\_extra\_tags | Extra tags for Current replica monitor | list(string) | `[]` | no |
| replica\_current\_message | Custom message for Current replica monitor | string | `""` | no |
| replica\_current\_threshold\_critical | Current replica critical threshold | string | `"1"` | no |
| replica\_current\_time\_aggregator | Monitor aggregator for Current replica [available values: min, max or avg] | string | `"max"` | no |
| replica\_current\_timeframe | Monitor timeframe for Current replica [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| replica\_ready\_enabled | Flag to enable Ready replica monitor | string | `"true"` | no |
| replica\_ready\_extra\_tags | Extra tags for Ready replica monitor | list | `[]` | no |
| replica\_ready\_extra\_tags | Extra tags for Ready replica monitor | list(string) | `[]` | no |
| replica\_ready\_message | Custom message for Ready replica monitor | string | `""` | no |
| replica\_ready\_threshold\_critical | Ready replica critical threshold | string | `"1"` | no |
| replica\_ready\_time\_aggregator | Monitor aggregator for Ready replica [available values: min, max or avg] | string | `"max"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "job" {
count = var.job_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Kubernetes job failed"
message = coalesce(var.job_message, var.message)
type = "service check"
type = "service check"
query = <<EOQ
"kubernetes_state.job.complete"${module.filter-tags.service_check}.by("job_name").last(6).count_by_status()
@ -86,7 +86,7 @@ resource "datadog_monitor" "replica_ready" {
count = var.replica_ready_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Kubernetes Ready replicas {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.replica_ready_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.replica_available_time_aggregator}(${var.replica_available_timeframe}):
@ -99,8 +99,8 @@ EOQ
critical = var.replica_ready_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
notify_audit = false
@ -109,7 +109,7 @@ EOQ
locked = false
require_full_window = true
tags = concat(["env:${var.environment}", "type:caas", "provider:kubernetes", "resource:kubernetes-workload", "team:claranet", "created-by:terraform"], var.replica_ready_extra_tags)
tags = concat(["env:${var.environment}", "type:caas", "provider:kubernetes", "resource:kubernetes-workload", "team:claranet", "created-by:terraform"], var.replica_ready_extra_tags)
}
resource "datadog_monitor" "replica_current" {

View File

@ -17,18 +17,16 @@ module "datadog-monitors-cloud-aws-alb" {
Creates DataDog monitors with the following checks:
- ALB healthy instances
- ALB HTTP code 4xx
- ALB HTTP code 5xx
- ALB target HTTP code 5xx
- ALB latency
- ALB target HTTP code 4xx
- ALB target HTTP code 5xx
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| alb\_no\_healthy\_instances\_enabled | Flag to enable ALB no healthy instances monitor | string | `"true"` | no |
| alb\_no\_healthy\_instances\_extra\_tags | Extra tags for ALB no healthy instances monitor | list | `[]` | no |
| alb\_no\_healthy\_instances\_extra\_tags | Extra tags for ALB no healthy instances monitor | list(string) | `[]` | no |
| alb\_no\_healthy\_instances\_message | Custom message for ALB no healthy instances monitor | string | `""` | no |
| alb\_no\_healthy\_instances\_time\_aggregator | Monitor aggregator for ALB no healthy instances [available values: min, max or avg] | string | `"min"` | no |
| alb\_no\_healthy\_instances\_timeframe | Monitor timeframe for ALB no healthy instances [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
@ -39,35 +37,35 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| httpcode\_alb\_4xx\_enabled | Flag to enable ALB httpcode 4xx monitor | string | `"true"` | no |
| httpcode\_alb\_4xx\_extra\_tags | Extra tags for ALB httpcode 4xx monitor | list | `[]` | no |
| httpcode\_alb\_4xx\_extra\_tags | Extra tags for ALB httpcode 4xx monitor | list(string) | `[]` | no |
| httpcode\_alb\_4xx\_message | Custom message for ALB httpcode 4xx monitor | string | `""` | no |
| httpcode\_alb\_4xx\_threshold\_critical | loadbalancer 4xx critical threshold in percentage | string | `"80"` | no |
| httpcode\_alb\_4xx\_threshold\_warning | loadbalancer 4xx warning threshold in percentage | string | `"60"` | no |
| httpcode\_alb\_4xx\_time\_aggregator | Monitor aggregator for ALB httpcode 4xx [available values: min, max or avg] | string | `"min"` | no |
| httpcode\_alb\_4xx\_timeframe | Monitor timeframe for ALB httpcode 4xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| httpcode\_alb\_5xx\_enabled | Flag to enable ALB httpcode 5xx monitor | string | `"true"` | no |
| httpcode\_alb\_5xx\_extra\_tags | Extra tags for ALB httpcode 5xx monitor | list | `[]` | no |
| httpcode\_alb\_5xx\_extra\_tags | Extra tags for ALB httpcode 5xx monitor | list(string) | `[]` | no |
| httpcode\_alb\_5xx\_message | Custom message for ALB httpcode 5xx monitor | string | `""` | no |
| httpcode\_alb\_5xx\_threshold\_critical | loadbalancer 5xx critical threshold in percentage | string | `"80"` | no |
| httpcode\_alb\_5xx\_threshold\_warning | loadbalancer 5xx warning threshold in percentage | string | `"60"` | no |
| httpcode\_alb\_5xx\_time\_aggregator | Monitor aggregator for ALB httpcode 5xx [available values: min, max or avg] | string | `"min"` | no |
| httpcode\_alb\_5xx\_timeframe | Monitor timeframe for ALB httpcode 5xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| httpcode\_target\_4xx\_enabled | Flag to enable ALB target httpcode 4xx monitor | string | `"true"` | no |
| httpcode\_target\_4xx\_extra\_tags | Extra tags for ALB target httpcode 4xx monitor | list | `[]` | no |
| httpcode\_target\_4xx\_extra\_tags | Extra tags for ALB target httpcode 4xx monitor | list(string) | `[]` | no |
| httpcode\_target\_4xx\_message | Custom message for ALB target httpcode 4xx monitor | string | `""` | no |
| httpcode\_target\_4xx\_threshold\_critical | target 4xx critical threshold in percentage | string | `"80"` | no |
| httpcode\_target\_4xx\_threshold\_warning | target 4xx warning threshold in percentage | string | `"60"` | no |
| httpcode\_target\_4xx\_time\_aggregator | Monitor aggregator for ALB target httpcode 4xx [available values: min, max or avg] | string | `"min"` | no |
| httpcode\_target\_4xx\_timeframe | Monitor timeframe for ALB target httpcode 4xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| httpcode\_target\_5xx\_enabled | Flag to enable ALB target httpcode 5xx monitor | string | `"true"` | no |
| httpcode\_target\_5xx\_extra\_tags | Extra tags for ALB target httpcode 5xx monitor | list | `[]` | no |
| httpcode\_target\_5xx\_extra\_tags | Extra tags for ALB target httpcode 5xx monitor | list(string) | `[]` | no |
| httpcode\_target\_5xx\_message | Custom message for ALB target httpcode 5xx monitor | string | `""` | no |
| httpcode\_target\_5xx\_threshold\_critical | target 5xx critical threshold in percentage | string | `"80"` | no |
| httpcode\_target\_5xx\_threshold\_warning | target 5xx warning threshold in percentage | string | `"60"` | no |
| httpcode\_target\_5xx\_time\_aggregator | Monitor aggregator for ALB target httpcode 5xx [available values: min, max or avg] | string | `"min"` | no |
| httpcode\_target\_5xx\_timeframe | Monitor timeframe for ALB target httpcode 5xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| latency\_enabled | Flag to enable ALB latency monitor | string | `"true"` | no |
| latency\_extra\_tags | Extra tags for ALB latency monitor | list | `[]` | no |
| latency\_extra\_tags | Extra tags for ALB latency monitor | list(string) | `[]` | no |
| latency\_message | Custom message for ALB latency monitor | string | `""` | no |
| latency\_threshold\_critical | latency critical threshold in milliseconds | string | `"1000"` | no |
| latency\_threshold\_warning | latency warning threshold in milliseconds | string | `"500"` | no |

View File

@ -32,7 +32,7 @@ resource "datadog_monitor" "ALB_latency" {
count = var.latency_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ALB latency {{#is_alert}}{{{comparator}}} {{threshold}}s ({{value}}s){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}s ({{value}}s){{/is_warning}}"
message = coalesce(var.latency_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.latency_time_aggregator}(${var.latency_timeframe}):
@ -45,8 +45,8 @@ critical = var.latency_threshold_critical
warning = var.latency_threshold_warning
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = false
@ -89,7 +89,7 @@ resource "datadog_monitor" "ALB_httpcode_4xx" {
count = var.httpcode_alb_4xx_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ALB HTTP code 4xx {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.httpcode_alb_4xx_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.httpcode_alb_4xx_time_aggregator}(${var.httpcode_alb_4xx_timeframe}):
@ -103,8 +103,8 @@ EOQ
warning = var.httpcode_alb_4xx_threshold_warning
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = false
@ -147,7 +147,7 @@ resource "datadog_monitor" "ALB_httpcode_target_4xx" {
count = var.httpcode_target_4xx_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ALB target HTTP code 4xx {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.httpcode_target_4xx_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.httpcode_target_4xx_time_aggregator}(${var.httpcode_target_4xx_timeframe}):
@ -161,8 +161,8 @@ critical = var.httpcode_target_4xx_threshold_critical
warning = var.httpcode_target_4xx_threshold_warning
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = false

View File

@ -16,9 +16,8 @@ module "datadog-monitors-cloud-aws-apigateway" {
Creates DataDog monitors with the following checks:
- API Gateway HTTP 4xx errors
- API Gateway HTTP 5xx errors
- API Gateway latency
- API Gateway HTTP 5xx errors
## Inputs
@ -29,21 +28,21 @@ Creates DataDog monitors with the following checks:
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| filter\_tags | Tags used for filtering | string | `"*"` | no |
| http\_4xx\_requests\_enabled | Flag to enable API Gateway HTTP 4xx requests monitor | string | `"true"` | no |
| http\_4xx\_requests\_extra\_tags | Extra tags for API Gateway HTTP 4xx requests monitor | list | `[]` | no |
| http\_4xx\_requests\_extra\_tags | Extra tags for API Gateway HTTP 4xx requests monitor | list(string) | `[]` | no |
| http\_4xx\_requests\_message | Custom message for API Gateway HTTP 4xx requests monitor | string | `""` | no |
| http\_4xx\_requests\_threshold\_critical | Maximum critical acceptable percent of 4xx errors | string | `"30"` | no |
| http\_4xx\_requests\_threshold\_warning | Maximum warning acceptable percent of 4xx errors | string | `"15"` | no |
| http\_4xx\_requests\_time\_aggregator | Monitor aggregator for API HTTP 4xx requests [available values: min, max or avg] | string | `"min"` | no |
| http\_4xx\_requests\_timeframe | Monitor timeframe for API HTTP 4xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| http\_5xx\_requests\_enabled | Flag to enable API Gateway HTTP 5xx requests monitor | string | `"true"` | no |
| http\_5xx\_requests\_extra\_tags | Extra tags for API Gateway HTTP 5xx requests monitor | list | `[]` | no |
| http\_5xx\_requests\_extra\_tags | Extra tags for API Gateway HTTP 5xx requests monitor | list(string) | `[]` | no |
| http\_5xx\_requests\_message | Custom message for API Gateway HTTP 5xx requests monitor | string | `""` | no |
| http\_5xx\_requests\_threshold\_critical | Maximum critical acceptable percent of 5xx errors | string | `"20"` | no |
| http\_5xx\_requests\_threshold\_warning | Maximum warning acceptable percent of 5xx errors | string | `"10"` | no |
| http\_5xx\_requests\_time\_aggregator | Monitor aggregator for API HTTP 5xx requests [available values: min, max or avg] | string | `"min"` | no |
| http\_5xx\_requests\_timeframe | Monitor timeframe for API HTTP 5xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| latency\_enabled | Flag to enable API Gateway latency monitor | string | `"true"` | no |
| latency\_extra\_tags | Extra tags for API Gateway latency monitor | list | `[]` | no |
| latency\_extra\_tags | Extra tags for API Gateway latency monitor | list(string) | `[]` | no |
| latency\_message | Custom message for API Gateway latency monitor | string | `""` | no |
| latency\_threshold\_critical | Alerting threshold in milliseconds | string | `"800"` | no |
| latency\_threshold\_warning | Warning threshold in milliseconds | string | `"400"` | no |

View File

@ -32,7 +32,7 @@ resource "datadog_monitor" "API_http_5xx_errors_count" {
count = var.http_5xx_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] API Gateway HTTP 5xx errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.http_5xx_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.http_5xx_requests_time_aggregator}(${var.http_5xx_requests_timeframe}):
@ -46,8 +46,8 @@ warning = var.http_5xx_requests_threshold_warning
critical = var.http_5xx_requests_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = false

View File

@ -16,12 +16,10 @@ module "datadog-monitors-cloud-aws-elasticache-common" {
Creates DataDog monitors with the following checks:
- Elasticache connections
- Elasticache eviction
- Elasticache evictions is growing
- Elasticache free memory
- Elasticache evictions is growing
- Elasticache max connections reached
- Elasticache swap
## Inputs
@ -30,10 +28,10 @@ Creates DataDog monitors with the following checks:
| environment | Infrastructure Environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| eviction\_enabled | Flag to enable Elasticache eviction monitor | string | `"true"` | no |
| eviction\_extra\_tags | Extra tags for Elasticache eviction monitor | list | `[]` | no |
| eviction\_extra\_tags | Extra tags for Elasticache eviction monitor | list(string) | `[]` | no |
| eviction\_growing\_condition\_timeframe | Monitor condition timeframe for Elasticache eviction growing [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| eviction\_growing\_enabled | Flag to enable Elasticache eviction growing monitor | string | `"true"` | no |
| eviction\_growing\_extra\_tags | Extra tags for Elasticache eviction growing monitor | list | `[]` | no |
| eviction\_growing\_extra\_tags | Extra tags for Elasticache eviction growing monitor | list(string) | `[]` | no |
| eviction\_growing\_message | Custom message for Elasticache eviction growing monitor | string | `""` | no |
| eviction\_growing\_threshold\_critical | Elasticache eviction growing critical threshold in percentage | string | `"30"` | no |
| eviction\_growing\_threshold\_warning | Elasticache eviction growing warning threshold in percentage | string | `"10"` | no |
@ -47,26 +45,26 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| free\_memory\_condition\_timeframe | Monitor condition timeframe for Elasticache free memory [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| free\_memory\_enabled | Flag to enable Elasticache free memory monitor | string | `"true"` | no |
| free\_memory\_extra\_tags | Extra tags for Elasticache free memory monitor | list | `[]` | no |
| free\_memory\_extra\_tags | Extra tags for Elasticache free memory monitor | list(string) | `[]` | no |
| free\_memory\_message | Custom message for Elasticache free memory monitor | string | `""` | no |
| free\_memory\_threshold\_critical | Elasticache free memory critical threshold in percentage | string | `"-70"` | no |
| free\_memory\_threshold\_warning | Elasticache free memory warning threshold in percentage | string | `"-50"` | no |
| free\_memory\_timeframe | Monitor timeframe for Elasticache free memory [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| max\_connection\_enabled | Flag to enable Elasticache max connection monitor | string | `"true"` | no |
| max\_connection\_extra\_tags | Extra tags for Elasticache max connection monitor | list | `[]` | no |
| max\_connection\_extra\_tags | Extra tags for Elasticache max connection monitor | list(string) | `[]` | no |
| max\_connection\_message | Custom message for Elasticache max connection monitor | string | `""` | no |
| max\_connection\_time\_aggregator | Monitor aggregator for Elasticache max connection [available values: min, max or avg] | string | `"max"` | no |
| max\_connection\_timeframe | Monitor timeframe for Elasticache max connection [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| message | Message sent when an alert is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| no\_connection\_enabled | Flag to enable Elasticache no connection monitor | string | `"true"` | no |
| no\_connection\_extra\_tags | Extra tags for Elasticache no connection monitor | list | `[]` | no |
| no\_connection\_extra\_tags | Extra tags for Elasticache no connection monitor | list(string) | `[]` | no |
| no\_connection\_message | Custom message for Elasticache no connection monitor | string | `""` | no |
| no\_connection\_time\_aggregator | Monitor aggregator for Elasticache no connection [available values: min, max or avg] | string | `"min"` | no |
| no\_connection\_timeframe | Monitor timeframe for Elasticache no connection [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| swap\_enabled | Flag to enable Elasticache swap monitor | string | `"true"` | no |
| swap\_extra\_tags | Extra tags for Elasticache swap monitor | list | `[]` | no |
| swap\_extra\_tags | Extra tags for Elasticache swap monitor | list(string) | `[]` | no |
| swap\_message | Custom message for Elasticache swap monitor | string | `""` | no |
| swap\_threshold\_critical | Elasticache swap critical threshold in bytes | string | `"50000000"` | no |
| swap\_threshold\_warning | Elasticache swap warning threshold in bytes | string | `"0"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "elasticache_eviction" {
count = var.eviction_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticache eviction {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}"
message = coalesce(var.eviction_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
sum(${var.eviction_timeframe}): (
@ -57,7 +57,7 @@ resource "datadog_monitor" "elasticache_no_connection" {
count = var.no_connection_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticache connections {{#is_alert}}{{{comparator}}} {{threshold}} {{/is_alert}}"
message = coalesce(var.no_connection_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.no_connection_time_aggregator}(${var.no_connection_timeframe}): (
@ -112,7 +112,7 @@ resource "datadog_monitor" "elasticache_free_memory" {
count = var.free_memory_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticache free memory {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.free_memory_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
pct_change(avg(${var.free_memory_timeframe}),${var.free_memory_condition_timeframe}):

View File

@ -24,7 +24,7 @@ Creates DataDog monitors with the following checks:
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_high\_enabled | Flag to enable Elasticache memcached cpu high monitor | string | `"true"` | no |
| cpu\_high\_extra\_tags | Extra tags for Elasticache memcached cpu high monitor | list | `[]` | no |
| cpu\_high\_extra\_tags | Extra tags for Elasticache memcached cpu high monitor | list(string) | `[]` | no |
| cpu\_high\_message | Custom message for Elasticache memcached cpu high monitor | string | `""` | no |
| cpu\_high\_threshold\_critical | Elasticache memcached cpu high critical threshold in percentage | string | `"90"` | no |
| cpu\_high\_threshold\_warning | Elasticache memcached cpu high warning threshold in percentage | string | `"75"` | no |
@ -36,7 +36,7 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| get\_hits\_enabled | Flag to enable Elasticache memcached get hits monitor | string | `"true"` | no |
| get\_hits\_extra\_tags | Extra tags for Elasticache memcached get hits monitor | list | `[]` | no |
| get\_hits\_extra\_tags | Extra tags for Elasticache memcached get hits monitor | list(string) | `[]` | no |
| get\_hits\_message | Custom message for Elasticache memcached get hits monitor | string | `""` | no |
| get\_hits\_threshold\_critical | Elasticache memcached get hits critical threshold in percentage | string | `"60"` | no |
| get\_hits\_threshold\_warning | Elasticache memcached get hits warning threshold in percentage | string | `"80"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "memcached_get_hits" {
count = var.get_hits_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticache memcached cache hit ratio {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.get_hits_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.get_hits_time_aggregator}(${var.get_hits_timeframe}): (

View File

@ -18,26 +18,24 @@ Creates DataDog monitors with the following checks:
- Elasticache redis cache hit ratio
- Elasticache redis CPU
- Elasticache redis is receiving no commands
- Elasticache redis replication lag
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cache\_hits\_enabled | Flag to enable Elasticache redis cache hits monitor | string | `"true"` | no |
| cache\_hits\_extra\_tags | Extra tags for Elasticache redis cache hits monitor | list | `[]` | no |
| cache\_hits\_extra\_tags | Extra tags for Elasticache redis cache hits monitor | list(string) | `[]` | no |
| cache\_hits\_message | Custom message for Elasticache redis cache hits monitor | string | `""` | no |
| cache\_hits\_threshold\_critical | Elasticache redis cache hits critical threshold in percentage | string | `"60"` | no |
| cache\_hits\_threshold\_warning | Elasticache redis cache hits warning threshold in percentage | string | `"80"` | no |
| cache\_hits\_time\_aggregator | Monitor aggregator for Elasticache redis cache hits [available values: min, max or avg] | string | `"max"` | no |
| cache\_hits\_timeframe | Monitor timeframe for Elasticache redis cache hits [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| commands\_enabled | Flag to enable Elasticache redis commands monitor | string | `"true"` | no |
| commands\_extra\_tags | Extra tags for Elasticache redis commands monitor | list | `[]` | no |
| commands\_extra\_tags | Extra tags for Elasticache redis commands monitor | list(string) | `[]` | no |
| commands\_message | Custom message for Elasticache redis commands monitor | string | `""` | no |
| commands\_timeframe | Monitor timeframe for Elasticache redis commands [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| cpu\_high\_enabled | Flag to enable Elasticache redis cpu high monitor | string | `"true"` | no |
| cpu\_high\_extra\_tags | Extra tags for Elasticache redis cpu high monitor | list | `[]` | no |
| cpu\_high\_extra\_tags | Extra tags for Elasticache redis cpu high monitor | list(string) | `[]` | no |
| cpu\_high\_message | Custom message for Elasticache redis cpu high monitor | string | `""` | no |
| cpu\_high\_threshold\_critical | Elasticache redis cpu high critical threshold in percentage | string | `"90"` | no |
| cpu\_high\_threshold\_warning | Elasticache redis cpu high warning threshold in percentage | string | `"75"` | no |
@ -52,7 +50,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| replication\_lag\_enabled | Flag to enable Elasticache redis replication lag monitor | string | `"true"` | no |
| replication\_lag\_extra\_tags | Extra tags for Elasticache redis replication lag monitor | list | `[]` | no |
| replication\_lag\_extra\_tags | Extra tags for Elasticache redis replication lag monitor | list(string) | `[]` | no |
| replication\_lag\_message | Custom message for Elasticache redis replication lag monitor | string | `""` | no |
| replication\_lag\_threshold\_critical | Elasticache redis replication lag critical threshold in seconds | string | `"180"` | no |
| replication\_lag\_threshold\_warning | Elasticache redis replication lag warning threshold in seconds | string | `"90"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "redis_cache_hits" {
count = var.cache_hits_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticache redis cache hit ratio {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cache_hits_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cache_hits_time_aggregator}(${var.cache_hits_timeframe}): default(
@ -59,7 +59,7 @@ resource "datadog_monitor" "redis_replication_lag" {
count = var.replication_lag_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticache redis replication lag {{#is_alert}}{{{comparator}}} {{threshold}}s ({{value}}s){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}s ({{value}}s){{/is_warning}}"
message = coalesce(var.replication_lag_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.replication_lag_time_aggregator}(${var.replication_lag_timeframe}): (

View File

@ -18,23 +18,22 @@ module "datadog-monitors-cloud-aws-elasticsearch" {
Creates DataDog monitors with the following checks:
- ElasticSearch cluster CPU high
- ElasticSearch cluster free storage space
- ElasticSearch cluster status is not green
- ElasticSearch cluster free storage space
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_enabled | Flag to enable ES cluster cpu monitor | string | `"true"` | no |
| cpu\_extra\_tags | Extra tags for ES cluster cpu monitor | list | `[]` | no |
| cpu\_extra\_tags | Extra tags for ES cluster cpu monitor | list(string) | `[]` | no |
| cpu\_message | Custom message for ES cluster cpu monitor | string | `""` | no |
| cpu\_threshold\_critical | CPU usage in percent (critical threshold) | string | `"90"` | no |
| cpu\_threshold\_warning | CPU usage in percent (warning threshold) | string | `"80"` | no |
| cpu\_time\_aggregator | Monitor aggregator for ES cluster cpu [available values: min, max or avg] | string | `"min"` | no |
| cpu\_timeframe | Monitor timeframe for ES cluster cpu [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| diskspace\_enabled | Flag to enable ES cluster diskspace monitor | string | `"true"` | no |
| diskspace\_extra\_tags | Extra tags for ES cluster diskspace monitor | list | `[]` | no |
| diskspace\_extra\_tags | Extra tags for ES cluster diskspace monitor | list(string) | `[]` | no |
| diskspace\_message | Custom message for ES cluster diskspace monitor | string | `""` | no |
| diskspace\_threshold\_critical | Disk free space in percent (critical threshold) | string | `"10"` | no |
| diskspace\_threshold\_warning | Disk free space in percent (warning threshold) | string | `"20"` | no |
@ -42,7 +41,7 @@ Creates DataDog monitors with the following checks:
| diskspace\_timeframe | Monitor timeframe for ES cluster diskspace [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| environment | Architecture Environment | string | n/a | yes |
| es\_cluster\_status\_enabled | Flag to enable ES cluster status monitor | string | `"true"` | no |
| es\_cluster\_status\_extra\_tags | Extra tags for ES cluster status monitor | list | `[]` | no |
| es\_cluster\_status\_extra\_tags | Extra tags for ES cluster status monitor | list(string) | `[]` | no |
| es\_cluster\_status\_message | Custom message for ES cluster status monitor | string | `""` | no |
| es\_cluster\_status\_timeframe | Monitor timeframe for ES cluster status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_30m"` | no |
| es\_cluster\_volume\_size | ElasticSearch Domain volume size (in GB) | string | n/a | yes |

View File

@ -7,7 +7,7 @@ resource "datadog_monitor" "es_cluster_status" {
count = var.es_cluster_status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ElasticSearch cluster status is not green"
message = coalesce(var.es_cluster_status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
max(${var.es_cluster_status_timeframe}): (
@ -71,7 +71,7 @@ resource "datadog_monitor" "es_cpu_90_15min" {
count = var.cpu_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ElasticSearch cluster CPU high {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cpu_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cpu_time_aggregator}(${var.cpu_timeframe}): (

View File

@ -16,11 +16,9 @@ module "datadog-monitors-cloud-aws-elb" {
Creates DataDog monitors with the following checks:
- ELB 4xx errors too high
- ELB 5xx errors too high
- ELB backend 4xx errors too high
- ELB backend 5xx errors too high
- ELB healthy instances
- ELB 4xx errors too high
- ELB latency too high
## Inputs
@ -29,38 +27,38 @@ Creates DataDog monitors with the following checks:
|------|-------------|:----:|:-----:|:-----:|
| artificial\_requests\_count | Number of false requests used to mitigate false positive in case of low trafic | string | `"5"` | no |
| elb\_4xx\_enabled | Flag to enable ELB 4xx errors monitor | string | `"true"` | no |
| elb\_4xx\_extra\_tags | Extra tags for ELB 4xx errors monitor | list | `[]` | no |
| elb\_4xx\_extra\_tags | Extra tags for ELB 4xx errors monitor | list(string) | `[]` | no |
| elb\_4xx\_message | Custom message for ELB 4xx errors monitor | string | `""` | no |
| elb\_4xx\_threshold\_critical | loadbalancer 4xx critical threshold in percentage | string | `"10"` | no |
| elb\_4xx\_threshold\_warning | loadbalancer 4xx warning threshold in percentage | string | `"5"` | no |
| elb\_4xx\_timeframe | Monitor timeframe for ELB 4xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| elb\_5xx\_enabled | Flag to enable ELB 5xx errors monitor | string | `"true"` | no |
| elb\_5xx\_extra\_tags | Extra tags for ELB 5xx errors monitor | list | `[]` | no |
| elb\_5xx\_extra\_tags | Extra tags for ELB 5xx errors monitor | list(string) | `[]` | no |
| elb\_5xx\_message | Custom message for ELB 5xx errors monitor | string | `""` | no |
| elb\_5xx\_threshold\_critical | loadbalancer 5xx critical threshold in percentage | string | `"10"` | no |
| elb\_5xx\_threshold\_warning | loadbalancer 5xx warning threshold in percentage | string | `"5"` | no |
| elb\_5xx\_timeframe | Monitor timeframe for ELB 5xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| elb\_backend\_4xx\_enabled | Flag to enable ELB backend 4xx errors monitor | string | `"true"` | no |
| elb\_backend\_4xx\_extra\_tags | Extra tags for ELB backend 4xx errors monitor | list | `[]` | no |
| elb\_backend\_4xx\_extra\_tags | Extra tags for ELB backend 4xx errors monitor | list(string) | `[]` | no |
| elb\_backend\_4xx\_message | Custom message for ELB backend 4xx errors monitor | string | `""` | no |
| elb\_backend\_4xx\_threshold\_critical | loadbalancer backend 4xx critical threshold in percentage | string | `"10"` | no |
| elb\_backend\_4xx\_threshold\_warning | loadbalancer backend 4xx warning threshold in percentage | string | `"5"` | no |
| elb\_backend\_4xx\_timeframe | Monitor timeframe for ELB backend 4xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| elb\_backend\_5xx\_enabled | Flag to enable ELB backend 5xx errors monitor | string | `"true"` | no |
| elb\_backend\_5xx\_extra\_tags | Extra tags for ELB backend 5xx errors monitor | list | `[]` | no |
| elb\_backend\_5xx\_extra\_tags | Extra tags for ELB backend 5xx errors monitor | list(string) | `[]` | no |
| elb\_backend\_5xx\_message | Custom message for ELB backend 5xx errors monitor | string | `""` | no |
| elb\_backend\_5xx\_threshold\_critical | loadbalancer backend 5xx critical threshold in percentage | string | `"10"` | no |
| elb\_backend\_5xx\_threshold\_warning | loadbalancer backend 5xx warning threshold in percentage | string | `"5"` | no |
| elb\_backend\_5xx\_timeframe | Monitor timeframe for ELB backend 5xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| elb\_backend\_latency\_critical | latency critical threshold in seconds | string | `"5"` | no |
| elb\_backend\_latency\_enabled | Flag to enable ELB backend latency monitor | string | `"true"` | no |
| elb\_backend\_latency\_extra\_tags | Extra tags for ELB backend latency monitor | list | `[]` | no |
| elb\_backend\_latency\_extra\_tags | Extra tags for ELB backend latency monitor | list(string) | `[]` | no |
| elb\_backend\_latency\_message | Custom message for ELB backend latency monitor | string | `""` | no |
| elb\_backend\_latency\_time\_aggregator | Monitor aggregator for ELB backend latency [available values: min, max or avg] | string | `"min"` | no |
| elb\_backend\_latency\_timeframe | Monitor timeframe for ELB backend latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| elb\_backend\_latency\_warning | latency warning threshold in seconds | string | `"1"` | no |
| elb\_no\_healthy\_instance\_enabled | Flag to enable ELB no healty instance monitor | string | `"true"` | no |
| elb\_no\_healthy\_instance\_extra\_tags | Extra tags for ELB no healty instance monitor | list | `[]` | no |
| elb\_no\_healthy\_instance\_extra\_tags | Extra tags for ELB no healty instance monitor | list(string) | `[]` | no |
| elb\_no\_healthy\_instance\_message | Custom message for ELB no healty instance monitor | string | `""` | no |
| elb\_no\_healthy\_instance\_time\_aggregator | Monitor aggregator for ELB no healty instance [available values: min or max] | string | `"min"` | no |
| elb\_no\_healthy\_instance\_timeframe | Monitor timeframe for ELB no healty instance [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "ELB_no_healthy_instances" {
count = var.elb_no_healthy_instance_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ELB healthy instances {{#is_alert}}is at 0{{/is_alert}}{{#is_warning}}is at {{value}}%%{{/is_warning}}"
message = coalesce(var.elb_no_healthy_instance_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.elb_no_healthy_instance_time_aggregator}(${var.elb_no_healthy_instance_timeframe}): (
@ -65,7 +65,7 @@ resource "datadog_monitor" "ELB_too_much_5xx" {
count = var.elb_5xx_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ELB 5xx errors too high {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.elb_5xx_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
sum(${var.elb_5xx_timeframe}):
@ -127,7 +127,7 @@ resource "datadog_monitor" "ELB_too_much_5xx_backend" {
count = var.elb_backend_5xx_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ELB backend 5xx errors too high {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.elb_backend_5xx_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
sum(${var.elb_backend_5xx_timeframe}):

View File

@ -28,7 +28,7 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| incoming\_records\_enabled | Flag to enable Kinesis Firehorse incoming records monitor | string | `"true"` | no |
| incoming\_records\_extra\_tags | Extra tags for Kinesis Firehorse incoming records monitor | list | `[]` | no |
| incoming\_records\_extra\_tags | Extra tags for Kinesis Firehorse incoming records monitor | list(string) | `[]` | no |
| incoming\_records\_message | Custom message for Kinesis Firehorse incoming records monitor | string | `""` | no |
| incoming\_records\_timeframe | Monitor timeframe for incoming records metrics evaluation [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| message | Message sent when an alert is triggered | string | n/a | yes |

View File

@ -3,7 +3,7 @@ resource "datadog_monitor" "firehose_incoming_records" {
count = var.incoming_records_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Kinesis Firehose No incoming records"
message = coalesce(var.incoming_records_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
sum(${var.incoming_records_timeframe}): (

View File

@ -23,7 +23,7 @@ Creates DataDog monitors with the following checks:
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| aurora\_replicalag\_enabled | Flag to enable RDS Aurora replica lag monitor | string | `"true"` | no |
| aurora\_replicalag\_extra\_tags | Extra tags for RDS Aurora replica lag monitor | list | `[]` | no |
| aurora\_replicalag\_extra\_tags | Extra tags for RDS Aurora replica lag monitor | list(string) | `[]` | no |
| aurora\_replicalag\_message | Custom message for RDS Aurora replica lag monitor | string | `""` | no |
| aurora\_replicalag\_threshold\_critical | Aurora replica lag in milliseconds (critical threshold) | string | `"200"` | no |
| aurora\_replicalag\_threshold\_warning | Aurora replica lag in milliseconds (warning threshold) | string | `"100"` | no |

View File

@ -3,7 +3,7 @@ resource "datadog_monitor" "rds_aurora_mysql_replica_lag" {
count = var.aurora_replicalag_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] RDS Aurora Mysql replica lag {{#is_alert}}{{{comparator}}} {{threshold}} ms ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ms ({{value}}%){{/is_warning}}"
message = coalesce(var.aurora_replicalag_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
avg(${var.aurora_replicalag_timeframe}): (

View File

@ -23,7 +23,7 @@ Creates DataDog monitors with the following checks:
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| aurora\_replicalag\_enabled | Flag to enable RDS Aurora replica lag monitor | string | `"true"` | no |
| aurora\_replicalag\_extra\_tags | Extra tags for RDS Aurora replica lag monitor | list | `[]` | no |
| aurora\_replicalag\_extra\_tags | Extra tags for RDS Aurora replica lag monitor | list(string) | `[]` | no |
| aurora\_replicalag\_message | Custom message for RDS Aurora replica lag monitor | string | `""` | no |
| aurora\_replicalag\_threshold\_critical | Aurora replica lag in milliseconds (critical threshold) | string | `"200"` | no |
| aurora\_replicalag\_threshold\_warning | Aurora replica lag in milliseconds (warning threshold) | string | `"100"` | no |

View File

@ -3,7 +3,7 @@ resource "datadog_monitor" "rds_aurora_postgresql_replica_lag" {
count = var.aurora_replicalag_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] RDS Aurora PostgreSQL replica lag {{#is_alert}}{{{comparator}}} {{threshold}} ms ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ms ({{value}}%){{/is_warning}}"
message = coalesce(var.aurora_replicalag_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
avg(${var.aurora_replicalag_timeframe}): (

View File

@ -18,21 +18,20 @@ Creates DataDog monitors with the following checks:
- RDS instance CPU high
- RDS instance free space
- RDS replica lag
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_enabled | Flag to enable RDS CPU usage monitor | string | `"true"` | no |
| cpu\_extra\_tags | Extra tags for RDS CPU usage monitor | list | `[]` | no |
| cpu\_extra\_tags | Extra tags for RDS CPU usage monitor | list(string) | `[]` | no |
| cpu\_message | Custom message for RDS CPU usage monitor | string | `""` | no |
| cpu\_threshold\_critical | CPU usage in percent (critical threshold) | string | `"90"` | no |
| cpu\_threshold\_warning | CPU usage in percent (warning threshold) | string | `"80"` | no |
| cpu\_time\_aggregator | Monitor aggregator for RDS CPU usage [available values: min, max or avg] | string | `"min"` | no |
| cpu\_timeframe | Monitor timeframe for RDS CPU usage [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| diskspace\_enabled | Flag to enable RDS free diskspace monitor | string | `"true"` | no |
| diskspace\_extra\_tags | Extra tags for RDS free diskspace monitor | list | `[]` | no |
| diskspace\_extra\_tags | Extra tags for RDS free diskspace monitor | list(string) | `[]` | no |
| diskspace\_message | Custom message for RDS free diskspace monitor | string | `""` | no |
| diskspace\_threshold\_critical | Disk free space in percent (critical threshold) | string | `"10"` | no |
| diskspace\_threshold\_warning | Disk free space in percent (warning threshold) | string | `"20"` | no |
@ -47,7 +46,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| replicalag\_enabled | Flag to enable RDS replica lag monitor | string | `"true"` | no |
| replicalag\_extra\_tags | Extra tags for RDS replica lag monitor | list | `[]` | no |
| replicalag\_extra\_tags | Extra tags for RDS replica lag monitor | list(string) | `[]` | no |
| replicalag\_message | Custom message for RDS replica lag monitor | string | `""` | no |
| replicalag\_threshold\_critical | replica lag in seconds (critical threshold) | string | `"300"` | no |
| replicalag\_threshold\_warning | replica lag in seconds (warning threshold) | string | `"200"` | no |

View File

@ -3,7 +3,7 @@ resource "datadog_monitor" "rds_cpu_90_15min" {
count = var.cpu_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] RDS instance CPU high {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cpu_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cpu_time_aggregator}(${var.cpu_timeframe}): (
@ -64,7 +64,7 @@ resource "datadog_monitor" "rds_replica_lag" {
count = var.replicalag_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] RDS replica lag {{#is_alert}}{{{comparator}}} {{threshold}} ms ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ms ({{value}}%){{/is_warning}}"
message = coalesce(var.replicalag_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
avg(${var.replicalag_timeframe}): (

View File

@ -29,7 +29,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| vpn\_status\_enabled | Flag to enable VPN status monitor | string | `"true"` | no |
| vpn\_status\_extra\_tags | Extra tags for VPN status monitor | list | `[]` | no |
| vpn\_status\_extra\_tags | Extra tags for VPN status monitor | list(string) | `[]` | no |
| vpn\_status\_message | Custom message for VPN status monitor | string | `""` | no |
| vpn\_status\_time\_aggregator | Monitor aggregator for VPN status [available values: min, max or avg] | string | `"max"` | no |
| vpn\_status\_timeframe | Monitor timeframe for VPN status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "VPN_status" {
count = var.vpn_status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] VPN tunnel down"
message = coalesce(var.vpn_status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.vpn_status_time_aggregator}(${var.vpn_status_timeframe}): (
@ -12,7 +12,7 @@ EOQ
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = true
notify_no_data = true
renotify_interval = 0
notify_audit = false
timeout_h = 0

View File

@ -19,8 +19,6 @@ Creates DataDog monitors with the following checks:
- API Management is down
- API Management successful requests rate too low
- API Management too many failed requests
- API Management too many other requests
- API Management too many unauthorized requests
## Inputs
@ -29,7 +27,7 @@ Creates DataDog monitors with the following checks:
| environment | Architecture environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| failed\_requests\_enabled | Flag to enable API Management failed requests monitor | string | `"true"` | no |
| failed\_requests\_extra\_tags | Extra tags for API Management failed requests monitor | list | `[]` | no |
| failed\_requests\_extra\_tags | Extra tags for API Management failed requests monitor | list(string) | `[]` | no |
| failed\_requests\_message | Custom message for API Management failed requests monitor | string | `""` | no |
| failed\_requests\_threshold\_critical | Maximum acceptable percent of failed requests | string | `"90"` | no |
| failed\_requests\_threshold\_warning | Warning regarding acceptable percent of failed requests | string | `"50"` | no |
@ -41,7 +39,7 @@ Creates DataDog monitors with the following checks:
| message | Message sent when a Redis monitor is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| other\_requests\_enabled | Flag to enable API Management other requests monitor | string | `"true"` | no |
| other\_requests\_extra\_tags | Extra tags for API Management other requests monitor | list | `[]` | no |
| other\_requests\_extra\_tags | Extra tags for API Management other requests monitor | list(string) | `[]` | no |
| other\_requests\_message | Custom message for API Management other requests monitor | string | `""` | no |
| other\_requests\_threshold\_critical | Maximum acceptable percent of other requests | string | `"90"` | no |
| other\_requests\_threshold\_warning | Warning regarding acceptable percent of other requests | string | `"50"` | no |
@ -49,19 +47,19 @@ Creates DataDog monitors with the following checks:
| other\_requests\_timeframe | Monitor timeframe for API Management other requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable API Management status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for API Management status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for API Management status monitor | list(string) | `[]` | no |
| status\_message | Custom message for API Management status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for API Management status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for API Management status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| successful\_requests\_enabled | Flag to enable API Management successful requests monitor | string | `"true"` | no |
| successful\_requests\_extra\_tags | Extra tags for API Management successful requests monitor | list | `[]` | no |
| successful\_requests\_extra\_tags | Extra tags for API Management successful requests monitor | list(string) | `[]` | no |
| successful\_requests\_message | Custom message for API Management successful requests monitor | string | `""` | no |
| successful\_requests\_threshold\_critical | Minimum acceptable percent of successful requests | string | `"10"` | no |
| successful\_requests\_threshold\_warning | Warning regarding acceptable percent of successful requests | string | `"30"` | no |
| successful\_requests\_time\_aggregator | Monitor aggregator for API Management successful requests [available values: min, max or avg] | string | `"max"` | no |
| successful\_requests\_timeframe | Monitor timeframe for API Management successful requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| unauthorized\_requests\_enabled | Flag to enable API Management unauthorized requests monitor | string | `"true"` | no |
| unauthorized\_requests\_extra\_tags | Extra tags for API Management unauthorized requests monitor | list | `[]` | no |
| unauthorized\_requests\_extra\_tags | Extra tags for API Management unauthorized requests monitor | list(string) | `[]` | no |
| unauthorized\_requests\_message | Custom message for API Management unauthorized requests monitor | string | `""` | no |
| unauthorized\_requests\_threshold\_critical | Maximum acceptable percent of unauthorized requests | string | `"90"` | no |
| unauthorized\_requests\_threshold\_warning | Warning regarding acceptable percent of unauthorized requests | string | `"50"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "apimgt_status" {
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] API Management is down"
message = coalesce(var.status_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}):avg:azure.apimanagement_service.status${module.filter-tags.query_alert} by {resource_group,region,name} < 1
@ -29,7 +29,7 @@ resource "datadog_monitor" "apimgt_failed_requests" {
count = var.failed_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] API Management too many failed requests {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.failed_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.failed_requests_time_aggregator}(${var.failed_requests_timeframe}): (
@ -60,7 +60,7 @@ resource "datadog_monitor" "apimgt_other_requests" {
count = var.other_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] API Management too many other requests {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.other_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.other_requests_time_aggregator}(${var.other_requests_timeframe}): (
@ -91,7 +91,7 @@ resource "datadog_monitor" "apimgt_unauthorized_requests" {
count = var.unauthorized_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] API Management too many unauthorized requests {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.unauthorized_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.unauthorized_requests_time_aggregator}(${var.unauthorized_requests_timeframe}): (
@ -122,7 +122,7 @@ resource "datadog_monitor" "apimgt_successful_requests" {
count = var.successful_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] API Management successful requests rate too low {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.successful_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.successful_requests_time_aggregator}(${var.successful_requests_timeframe}):

View File

@ -16,12 +16,10 @@ module "datadog-monitors-cloud-azure-app-services" {
Creates DataDog monitors with the following checks:
- App Services HTTP 4xx errors too high
- App Services HTTP 5xx errors too high
- App Services HTTP successful responses too low
- App Services response time too high
- App Services is down
- App Services memory usage
- App Services response time too high
## Inputs
@ -33,28 +31,28 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| http\_4xx\_requests\_enabled | Flag to enable App Services 4xx requests monitor | string | `"true"` | no |
| http\_4xx\_requests\_extra\_tags | Extra tags for App Services 4xx requests monitor | list | `[]` | no |
| http\_4xx\_requests\_extra\_tags | Extra tags for App Services 4xx requests monitor | list(string) | `[]` | no |
| http\_4xx\_requests\_message | Custom message for App Services 4xx requests monitor | string | `""` | no |
| http\_4xx\_requests\_threshold\_critical | Maximum critical acceptable percent of 4xx errors | string | `"90"` | no |
| http\_4xx\_requests\_threshold\_warning | Warning regarding acceptable percent of 4xx errors | string | `"50"` | no |
| http\_4xx\_requests\_time\_aggregator | Monitor aggregator for App Services 4xx requests [available values: min, max or avg] | string | `"min"` | no |
| http\_4xx\_requests\_timeframe | Monitor timeframe for App Services 4xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| http\_5xx\_requests\_enabled | Flag to enable App Services 5xx requests monitor | string | `"true"` | no |
| http\_5xx\_requests\_extra\_tags | Extra tags for App Services 5xx requests monitor | list | `[]` | no |
| http\_5xx\_requests\_extra\_tags | Extra tags for App Services 5xx requests monitor | list(string) | `[]` | no |
| http\_5xx\_requests\_message | Custom message for App Services 5xx requests monitor | string | `""` | no |
| http\_5xx\_requests\_threshold\_critical | Maximum critical acceptable percent of 5xx errors | string | `"90"` | no |
| http\_5xx\_requests\_threshold\_warning | Warning regarding acceptable percent of 5xx errors | string | `"50"` | no |
| http\_5xx\_requests\_time\_aggregator | Monitor aggregator for App Services 5xx requests [available values: min, max or avg] | string | `"min"` | no |
| http\_5xx\_requests\_timeframe | Monitor timeframe for App Services 5xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| http\_successful\_requests\_enabled | Flag to enable App Services successful requests monitor | string | `"true"` | no |
| http\_successful\_requests\_extra\_tags | Extra tags for App Services successful requests monitor | list | `[]` | no |
| http\_successful\_requests\_extra\_tags | Extra tags for App Services successful requests monitor | list(string) | `[]` | no |
| http\_successful\_requests\_message | Custom message for App Services successful requests monitor | string | `""` | no |
| http\_successful\_requests\_threshold\_critical | Minimum critical acceptable percent of 2xx & 3xx requests | string | `"10"` | no |
| http\_successful\_requests\_threshold\_warning | Warning regarding acceptable percent of 2xx & 3xx requests | string | `"30"` | no |
| http\_successful\_requests\_time\_aggregator | Monitor aggregator for App Services successful requests [available values: min, max or avg] | string | `"max"` | no |
| http\_successful\_requests\_timeframe | Monitor timeframe for App Services successful requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| memory\_usage\_enabled | Flag to enable App Services memory usage monitor | string | `"true"` | no |
| memory\_usage\_extra\_tags | Extra tags for App Services memory usage monitor | list | `[]` | no |
| memory\_usage\_extra\_tags | Extra tags for App Services memory usage monitor | list(string) | `[]` | no |
| memory\_usage\_message | Custom message for App Services memory usage monitor | string | `""` | no |
| memory\_usage\_threshold\_critical | Alerting threshold in Mib | string | `"1073741824"` | no |
| memory\_usage\_threshold\_warning | Warning threshold in MiB | string | `"536870912"` | no |
@ -64,14 +62,14 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| response\_time\_enabled | Flag to enable App Services response time monitor | string | `"true"` | no |
| response\_time\_extra\_tags | Extra tags for App Services response time monitor | list | `[]` | no |
| response\_time\_extra\_tags | Extra tags for App Services response time monitor | list(string) | `[]` | no |
| response\_time\_message | Custom message for App Services response time monitor | string | `""` | no |
| response\_time\_threshold\_critical | Alerting threshold for response time in seconds | string | `"10"` | no |
| response\_time\_threshold\_warning | Warning threshold for response time in seconds | string | `"5"` | no |
| response\_time\_time\_aggregator | Monitor aggregator for App Services response time [available values: min, max or avg] | string | `"min"` | no |
| response\_time\_timeframe | Monitor timeframe for App Services response time [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| status\_enabled | Flag to enable App Services status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for App Services status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for App Services status monitor | list(string) | `[]` | no |
| status\_message | Custom message for App Services status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for App Services status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for App Services status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -32,7 +32,7 @@ resource "datadog_monitor" "appservices_memory_usage_count" {
count = var.memory_usage_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] App Services memory usage {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.memory_usage_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.memory_usage_time_aggregator}(${var.memory_usage_timeframe}): (
@ -45,8 +45,8 @@ warning = var.memory_usage_threshold_warning
critical = var.memory_usage_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = false
@ -91,7 +91,7 @@ resource "datadog_monitor" "appservices_http_4xx_errors_count" {
count = var.http_4xx_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] App Services HTTP 4xx errors too high {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.http_4xx_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.http_4xx_requests_time_aggregator}(${var.http_4xx_requests_timeframe}): (
@ -105,8 +105,8 @@ EOQ
critical = var.http_4xx_requests_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false # Will NOT notify when no data is received
renotify_interval = 0
require_full_window = false
@ -163,8 +163,8 @@ thresholds = {
critical = 1
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = true
renotify_interval = 0
require_full_window = false

View File

@ -29,7 +29,7 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| latency\_enabled | Flag to enable Azure Search latency monitor | string | `"true"` | no |
| latency\_extra\_tags | Extra tags for Azure Search latency monitor | list | `[]` | no |
| latency\_extra\_tags | Extra tags for Azure Search latency monitor | list(string) | `[]` | no |
| latency\_message | Custom message for Azure Search latency monitor | string | `""` | no |
| latency\_threshold\_critical | Alerting threshold for Azure Search latency in seconds | string | `"4"` | no |
| latency\_threshold\_warning | Warning threshold for Azure Search latency in seconds | string | `"2"` | no |
@ -39,7 +39,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| throttled\_queries\_rate\_enabled | Flag to enable Azure Search throttled queries rate monitor | string | `"true"` | no |
| throttled\_queries\_rate\_extra\_tags | Extra tags for Azure Search throttled queries rate monitor | list | `[]` | no |
| throttled\_queries\_rate\_extra\_tags | Extra tags for Azure Search throttled queries rate monitor | list(string) | `[]` | no |
| throttled\_queries\_rate\_message | Custom message for Azure Search throttled queries rate monitor | string | `""` | no |
| throttled\_queries\_rate\_threshold\_critical | Alerting threshold for Azure Search throttled queries rate | string | `"50"` | no |
| throttled\_queries\_rate\_threshold\_warning | Warning threshold for Azure Search throttled queries rate | string | `"25"` | no |

View File

@ -32,7 +32,7 @@ resource "datadog_monitor" "azure_search_throttled_queries_rate" {
count = var.throttled_queries_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Search throttled queries rate is too high {{#is_alert}}{{{comparator}}} {{threshold}}s ({{value}}s){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}s ({{value}}s){{/is_warning}}"
message = coalesce(var.throttled_queries_rate_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.throttled_queries_rate_time_aggregator}(${var.throttled_queries_rate_timeframe}): (
@ -45,8 +45,8 @@ warning = var.throttled_queries_rate_threshold_warning
critical = var.throttled_queries_rate_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = false

View File

@ -16,23 +16,21 @@ module "datadog-monitors-cloud-azure-cosmosdb" {
Creates DataDog monitors with the following checks:
- Cosmos DB 4xx requests rate is high
- Cosmos DB 5xx requests rate is high
- Cosmos DB is down
- Cosmos DB max scaling reached for collection
- Cosmos DB 4xx requests rate is high
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cosmos\_db\_4xx\_request\_extra\_tags | Extra tags for Cosmos DB 4xx requests monitor | list | `[]` | no |
| cosmos\_db\_4xx\_request\_extra\_tags | Extra tags for Cosmos DB 4xx requests monitor | list(string) | `[]` | no |
| cosmos\_db\_4xx\_request\_rate\_threshold\_critical | Critical threshold for Cosmos DB 4xx requests monitor | string | `"80"` | no |
| cosmos\_db\_4xx\_request\_rate\_threshold\_warning | Warning threshold for Cosmos DB 4xx requests monitor | string | `"50"` | no |
| cosmos\_db\_4xx\_request\_time\_aggregator | Monitor aggregator for Cosmos DB 4xx requests [available values: min, max or avg] | string | `"min"` | no |
| cosmos\_db\_4xx\_request\_timeframe | Monitor timeframe for Cosmos DB 4xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| cosmos\_db\_4xx\_requests\_enabled | Flag to enable Cosmos DB 4xx requests monitor | string | `"true"` | no |
| cosmos\_db\_4xx\_requests\_message | Custom message for Cosmos DB 4xx requests monitor | string | `""` | no |
| cosmos\_db\_5xx\_request\_rate\_extra\_tags | Extra tags for Cosmos DB 5xx requests monitor | list | `[]` | no |
| cosmos\_db\_5xx\_request\_rate\_extra\_tags | Extra tags for Cosmos DB 5xx requests monitor | list(string) | `[]` | no |
| cosmos\_db\_5xx\_request\_rate\_threshold\_critical | Critical threshold for Cosmos DB 5xx requests monitor | string | `"80"` | no |
| cosmos\_db\_5xx\_request\_rate\_threshold\_warning | Warning threshold for Cosmos DB 5xx requests monitor | string | `"50"` | no |
| cosmos\_db\_5xx\_request\_time\_aggregator | Monitor aggregator for Cosmos DB 5xx requests [available values: min, max or avg] | string | `"min"` | no |
@ -42,7 +40,7 @@ Creates DataDog monitors with the following checks:
| cosmos\_db\_scaling\_enabled | Flag to enable Cosmos DB scaling monitor | string | `"true"` | no |
| cosmos\_db\_scaling\_error\_rate\_threshold\_critical | Critical threshold for Cosmos DB scaling monitor | string | `"10"` | no |
| cosmos\_db\_scaling\_error\_rate\_threshold\_warning | Warning threshold for Cosmos DB scaling monitor | string | `"5"` | no |
| cosmos\_db\_scaling\_extra\_tags | Extra tags for Cosmos DB scaling monitor | list | `[]` | no |
| cosmos\_db\_scaling\_extra\_tags | Extra tags for Cosmos DB scaling monitor | list(string) | `[]` | no |
| cosmos\_db\_scaling\_message | Custom message for Cosmos DB scaling monitor | string | `""` | no |
| cosmos\_db\_scaling\_time\_aggregator | Monitor aggregator for Cosmos DB scaling [available values: min, max or avg] | string | `"min"` | no |
| cosmos\_db\_scaling\_timeframe | Monitor timeframe for Cosmos DB scaling [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
@ -55,7 +53,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable Cosmos DB status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Cosmos DB status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for Cosmos DB status monitor | list(string) | `[]` | no |
| status\_message | Custom message for Cosmos DB status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Cosmos DB status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Cosmos DB status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -1,8 +1,8 @@
resource "datadog_monitor" "cosmos_db_status" {
count = var.status_enabled == "true" ? 1 : 0
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Cosmos DB is down"
message = coalesce(var.status_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}):
@ -69,10 +69,10 @@ tags = concat(["env:${var.environment}", "type:cloud", "provider:azure", "resour
}
resource "datadog_monitor" "cosmos_db_5xx_requests" {
count = var.cosmos_db_5xx_requests_enabled == "true" ? 1 : 0
count = var.cosmos_db_5xx_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Cosmos DB 5xx requests rate is high {{#is_alert}}{{comparator}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{comparator}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cosmos_db_5xx_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cosmos_db_5xx_request_time_aggregator}(${var.cosmos_db_5xx_request_timeframe}): default( (

View File

@ -31,7 +31,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable Datalake Store status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Datalake Store status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | list | `[]` | no |
| status\_extra\_tags | Extra tags for Datalake Store status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | list(string) | `[]` | no |
| status\_message | Custom message for Datalake Store status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Datalake Store status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Datalake Store status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -1,8 +1,8 @@
resource "datadog_monitor" "datalakestore_status" {
count = var.status_enabled == "true" ? 1 : 0
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Datalake Store is down"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (

View File

@ -18,7 +18,6 @@ Creates DataDog monitors with the following checks:
- Event Grid no successful message
- Event Grid too many failed messages
- Event Grid too many unmatched events
## Inputs
@ -27,7 +26,7 @@ Creates DataDog monitors with the following checks:
| environment | Architecture environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| failed\_messages\_rate\_enabled | Flag to enable Event Grid failed messages monitor | string | `"true"` | no |
| failed\_messages\_rate\_extra\_tags | Extra tags for Event Grid failed messages monitor | list | `[]` | no |
| failed\_messages\_rate\_extra\_tags | Extra tags for Event Grid failed messages monitor | list(string) | `[]` | no |
| failed\_messages\_rate\_message | Custom message for Event Grid failed messages monitor | string | `""` | no |
| failed\_messages\_rate\_thresold\_critical | Failed messages ratio (percentage) to trigger the critical alert | string | `"90"` | no |
| failed\_messages\_rate\_thresold\_warning | Failed messages ratio (percentage) to trigger a warning alert | string | `"50"` | no |
@ -39,13 +38,13 @@ Creates DataDog monitors with the following checks:
| message | Message sent when an alert is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| no\_successful\_message\_rate\_enabled | Flag to enable Event Grid no successful message monitor | string | `"true"` | no |
| no\_successful\_message\_rate\_extra\_tags | Extra tags for Event Grid no successful message monitor | list | `[]` | no |
| no\_successful\_message\_rate\_extra\_tags | Extra tags for Event Grid no successful message monitor | list(string) | `[]` | no |
| no\_successful\_message\_rate\_message | Custom message for Event Grid no successful message monitor | string | `""` | no |
| no\_successful\_message\_rate\_time\_aggregator | Monitor aggregator for Event Grid no successful message [available values: min, max or avg] | string | `"min"` | no |
| no\_successful\_message\_rate\_timeframe | Monitor timeframe for Event Grid no successful message [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| unmatched\_events\_rate\_enabled | Flag to enable Event Grid unmatched events monitor | string | `"true"` | no |
| unmatched\_events\_rate\_extra\_tags | Extra tags for Event Grid unmatched events monitor | list | `[]` | no |
| unmatched\_events\_rate\_extra\_tags | Extra tags for Event Grid unmatched events monitor | list(string) | `[]` | no |
| unmatched\_events\_rate\_message | Custom message for Event Grid unmatched events monitor | string | `""` | no |
| unmatched\_events\_rate\_thresold\_critical | Unmatched events ratio (percentage) to trigger the critical alert | string | `"90"` | no |
| unmatched\_events\_rate\_thresold\_warning | Unmatched events ratio (percentage) to trigger a warning alert | string | `"50"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "eventgrid_no_successful_message" {
count = var.no_successful_message_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Event Grid no successful message {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.no_successful_message_rate_message, var.message)
type = "metric alert"
type = "metric alert"
# Query is a bit weird, but we only want to check the no-data
query = <<EOQ
@ -61,7 +61,7 @@ resource "datadog_monitor" "eventgrid_unmatched_events" {
count = var.unmatched_events_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Event Grid too many unmatched events {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.unmatched_events_rate_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.unmatched_events_rate_time_aggregator}(${var.unmatched_events_rate_timeframe}): (default(

View File

@ -17,7 +17,6 @@ module "datadog-monitors-cloud-azure-eventhub" {
Creates DataDog monitors with the following checks:
- Event Hub is down
- Event Hub too many errors
- Event Hub too many failed requests
## Inputs
@ -26,7 +25,7 @@ Creates DataDog monitors with the following checks:
|------|-------------|:----:|:-----:|:-----:|
| environment | Architecture environment | string | n/a | yes |
| errors\_rate\_enabled | Flag to enable Event Hub errors monitor | string | `"true"` | no |
| errors\_rate\_extra\_tags | Extra tags for Event Hub errors monitor | list | `[]` | no |
| errors\_rate\_extra\_tags | Extra tags for Event Hub errors monitor | list(string) | `[]` | no |
| errors\_rate\_message | Custom message for Event Hub errors monitor | string | `""` | no |
| errors\_rate\_thresold\_critical | Errors ratio (percentage) to trigger the critical alert | string | `"90"` | no |
| errors\_rate\_thresold\_warning | Errors ratio (percentage) to trigger a warning alert | string | `"50"` | no |
@ -34,7 +33,7 @@ Creates DataDog monitors with the following checks:
| errors\_rate\_timeframe | Monitor timeframe for Event Hub errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| failed\_requests\_rate\_enabled | Flag to enable Event Hub failed requests monitor | string | `"true"` | no |
| failed\_requests\_rate\_extra\_tags | Extra tags for Event Hub failed requests monitor | list | `[]` | no |
| failed\_requests\_rate\_extra\_tags | Extra tags for Event Hub failed requests monitor | list(string) | `[]` | no |
| failed\_requests\_rate\_message | Custom message for Event Hub failed requests monitor | string | `""` | no |
| failed\_requests\_rate\_thresold\_critical | Failed requests ratio (percentage) to trigger the critical alert | string | `"90"` | no |
| failed\_requests\_rate\_thresold\_warning | Failed requests ratio (percentage) to trigger a warning alert | string | `"50"` | no |
@ -47,7 +46,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable Event Hub status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Event Hub status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for Event Hub status monitor | list(string) | `[]` | no |
| status\_message | Custom message for Event Hub status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Event Hub status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Event Hub status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "eventhub_status" {
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Event Hub is down"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (
@ -58,7 +58,7 @@ resource "datadog_monitor" "eventhub_errors" {
count = var.errors_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Event Hub too many errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.errors_rate_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.errors_rate_time_aggregator}(${var.errors_rate_timeframe}): ( (

View File

@ -16,9 +16,8 @@ module "datadog-monitors-cloud-azure-functions" {
Creates DataDog monitors with the following checks:
- Function App connections count too high
- Function App HTTP 5xx errors too high
- Function App threads count too high
- Function App connections count too high
## Inputs
@ -30,21 +29,21 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| high\_connections\_count\_enabled | Flag to enable Functions high connections count monitor | string | `"true"` | no |
| high\_connections\_count\_extra\_tags | Extra tags for Functions high connections count monitor | list | `[]` | no |
| high\_connections\_count\_extra\_tags | Extra tags for Functions high connections count monitor | list(string) | `[]` | no |
| high\_connections\_count\_message | Custom message for Functions high connections count monitor | string | `""` | no |
| high\_connections\_count\_threshold\_critical | Alerting threshold for Functions high connections count | string | `"590"` | no |
| high\_connections\_count\_threshold\_warning | Warning threshold for Functions high connections count | string | `"550"` | no |
| high\_connections\_count\_time\_aggregator | Monitor aggregator for Functions high connections count [available values: min, max or avg] | string | `"min"` | no |
| high\_connections\_count\_timeframe | Monitor timeframe for Functions high connections count [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| high\_threads\_count\_enabled | Flag to enable Functions high threads count monitor | string | `"true"` | no |
| high\_threads\_count\_extra\_tags | Extra tags for Functions high threads count monitor | list | `[]` | no |
| high\_threads\_count\_extra\_tags | Extra tags for Functions high threads count monitor | list(string) | `[]` | no |
| high\_threads\_count\_message | Custom message for Functions high threads count monitor | string | `""` | no |
| high\_threads\_count\_threshold\_critical | Alerting threshold for Functions high threads count | string | `"510"` | no |
| high\_threads\_count\_threshold\_warning | Warning threshold for Functions high threads count | string | `"490"` | no |
| high\_threads\_count\_time\_aggregator | Monitor aggregator for Functions high threads count [available values: min, max or avg] | string | `"min"` | no |
| high\_threads\_count\_timeframe | Monitor timeframe for Functions high threads count [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| http\_5xx\_errors\_rate\_enabled | Flag to enable Functions Http 5xx errors rate monitor | string | `"true"` | no |
| http\_5xx\_errors\_rate\_extra\_tags | Extra tags for Functions Http 5xx errors rate monitor | list | `[]` | no |
| http\_5xx\_errors\_rate\_extra\_tags | Extra tags for Functions Http 5xx errors rate monitor | list(string) | `[]` | no |
| http\_5xx\_errors\_rate\_message | Custom message for Functions Http 5xx errors rate monitor | string | `""` | no |
| http\_5xx\_errors\_rate\_threshold\_critical | Alerting threshold for Functions Http 5xx errors rate | string | `"20"` | no |
| http\_5xx\_errors\_rate\_threshold\_warning | Warning threshold for Functions Http 5xx errors rate | string | `"10"` | no |

View File

@ -31,7 +31,7 @@ resource "datadog_monitor" "function_high_connections_count" {
count = var.high_connections_count_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Function App connections count too high {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.high_connections_count_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.high_connections_count_time_aggregator}(${var.high_connections_count_timeframe}):
@ -44,8 +44,8 @@ warning = var.high_connections_count_threshold_warning
critical = var.high_connections_count_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = false

View File

@ -16,27 +16,21 @@ module "datadog-monitors-cloud-azure-iothubs" {
Creates DataDog monitors with the following checks:
- IOT Hub is down
- IOT Hub Too many c2d methods failure
- IOT Hub Too many c2d twin read failure
- IOT Hub Too many c2d twin update failure
- IOT Hub Too many d2c telemetry egress dropped
- IOT Hub Too many d2c telemetry egress invalid
- IOT Hub Too many d2c telemetry egress orphaned
- IOT Hub Too many d2c telemetry ingress not sent
- IOT Hub Too many d2c twin read failure
- IOT Hub Too many d2c twin update failure
- IOT Hub Too many jobs failed
- IOT Hub Too many list_jobs failure
- IOT Hub Too many query_jobs failed
- IOT Hub Total devices is wrong
- IOT Hub Too many c2d methods failure
- IOT Hub Too many d2c telemetry ingress not sent
- IOT Hub Too many d2c twin update failure
- IOT Hub Too many list_jobs failure
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| dropped\_d2c\_telemetry\_egress\_enabled | Flag to enable IoT Hub dropped d2c telemetry monitor | string | `"true"` | no |
| dropped\_d2c\_telemetry\_egress\_extra\_tags | Extra tags for IoT Hub dropped d2c telemetry monitor | list | `[]` | no |
| dropped\_d2c\_telemetry\_egress\_extra\_tags | Extra tags for IoT Hub dropped d2c telemetry monitor | list(string) | `[]` | no |
| dropped\_d2c\_telemetry\_egress\_message | Custom message for IoT Hub dropped d2c telemetry monitor | string | `""` | no |
| dropped\_d2c\_telemetry\_egress\_rate\_threshold\_critical | D2C Telemetry Dropped limit (critical threshold) | string | `"90"` | no |
| dropped\_d2c\_telemetry\_egress\_rate\_threshold\_warning | D2C Telemetry Dropped limit (warning threshold) | string | `"50"` | no |
@ -45,56 +39,56 @@ Creates DataDog monitors with the following checks:
| environment | Architecture Environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| failed\_c2d\_methods\_rate\_enabled | Flag to enable IoT Hub failed c2d methods monitor | string | `"true"` | no |
| failed\_c2d\_methods\_rate\_extra\_tags | Extra tags for IoT Hub failed c2d methods monitor | list | `[]` | no |
| failed\_c2d\_methods\_rate\_extra\_tags | Extra tags for IoT Hub failed c2d methods monitor | list(string) | `[]` | no |
| failed\_c2d\_methods\_rate\_message | Custom message for IoT Hub failed c2d method monitor | string | `""` | no |
| failed\_c2d\_methods\_rate\_threshold\_critical | C2D Methods Failed rate limit (critical threshold) | string | `"90"` | no |
| failed\_c2d\_methods\_rate\_threshold\_warning | C2D Methods Failed rate limit (warning threshold) | string | `"50"` | no |
| failed\_c2d\_methods\_rate\_time\_aggregator | Monitor aggregator for IoT Hub failed c2d method [available values: min, max, sum or avg] | string | `"min"` | no |
| failed\_c2d\_methods\_rate\_timeframe | Monitor timeframe for IoT Hub failed c2d method [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| failed\_c2d\_twin\_read\_rate\_enabled | Flag to enable IoT Hub failed c2d twin read monitor | string | `"true"` | no |
| failed\_c2d\_twin\_read\_rate\_extra\_tags | Extra tags for IoT Hub failed c2d twin read monitor | list | `[]` | no |
| failed\_c2d\_twin\_read\_rate\_extra\_tags | Extra tags for IoT Hub failed c2d twin read monitor | list(string) | `[]` | no |
| failed\_c2d\_twin\_read\_rate\_message | Custom message for IoT Hub failed c2d twin read monitor | string | `""` | no |
| failed\_c2d\_twin\_read\_rate\_threshold\_critical | C2D Twin Read Failed rate limit (critical threshold) | string | `"90"` | no |
| failed\_c2d\_twin\_read\_rate\_threshold\_warning | C2D Twin Read Failed rate limit (warning threshold) | string | `"50"` | no |
| failed\_c2d\_twin\_read\_rate\_time\_aggregator | Monitor aggregator for IoT Hub failed c2d twin read [available values: min, max, sum or avg] | string | `"min"` | no |
| failed\_c2d\_twin\_read\_rate\_timeframe | Monitor timeframe for IoT Hub failed c2d twin read [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| failed\_c2d\_twin\_update\_rate\_enabled | Flag to enable IoT Hub failed c2d twin update monitor | string | `"true"` | no |
| failed\_c2d\_twin\_update\_rate\_extra\_tags | Extra tags for IoT Hub failed c2d twin update monitor | list | `[]` | no |
| failed\_c2d\_twin\_update\_rate\_extra\_tags | Extra tags for IoT Hub failed c2d twin update monitor | list(string) | `[]` | no |
| failed\_c2d\_twin\_update\_rate\_message | Custom message for IoT Hub failed c2d twin update monitor | string | `""` | no |
| failed\_c2d\_twin\_update\_rate\_threshold\_critical | C2D Twin Update Failed rate limit (critical threshold) | string | `"90"` | no |
| failed\_c2d\_twin\_update\_rate\_threshold\_warning | C2D Twin Update Failed rate limit (warning threshold) | string | `"50"` | no |
| failed\_c2d\_twin\_update\_rate\_time\_aggregator | Monitor aggregator for IoT Hub failed c2d twin update [available values: min, max, sum or avg] | string | `"min"` | no |
| failed\_c2d\_twin\_update\_rate\_timeframe | Monitor timeframe for IoT Hub failed c2d twin update [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| failed\_d2c\_twin\_read\_rate\_enabled | Flag to enable IoT Hub failed d2c twin read monitor | string | `"true"` | no |
| failed\_d2c\_twin\_read\_rate\_extra\_tags | Extra tags for IoT Hub failed d2c twin read monitor | list | `[]` | no |
| failed\_d2c\_twin\_read\_rate\_extra\_tags | Extra tags for IoT Hub failed d2c twin read monitor | list(string) | `[]` | no |
| failed\_d2c\_twin\_read\_rate\_message | Custom message for IoT Hub failed d2c twin read monitor | string | `""` | no |
| failed\_d2c\_twin\_read\_rate\_threshold\_critical | D2C Twin Read Failed rate limit (critical threshold) | string | `"90"` | no |
| failed\_d2c\_twin\_read\_rate\_threshold\_warning | D2C Twin Read Failed rate limit (warning threshold) | string | `"50"` | no |
| failed\_d2c\_twin\_read\_rate\_time\_aggregator | Monitor aggregator for IoT Hub failed d2c twin read [available values: min, max, sum or avg] | string | `"min"` | no |
| failed\_d2c\_twin\_read\_rate\_timeframe | Monitor timeframe for IoT Hub failed d2c twin read [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| failed\_d2c\_twin\_update\_rate\_enabled | Flag to enable IoT Hub failed d2c twin update monitor | string | `"true"` | no |
| failed\_d2c\_twin\_update\_rate\_extra\_tags | Extra tags for IoT Hub failed d2c twin update monitor | list | `[]` | no |
| failed\_d2c\_twin\_update\_rate\_extra\_tags | Extra tags for IoT Hub failed d2c twin update monitor | list(string) | `[]` | no |
| failed\_d2c\_twin\_update\_rate\_message | Custom message for IoT Hub failed d2c twin update monitor | string | `""` | no |
| failed\_d2c\_twin\_update\_rate\_threshold\_critical | D2C Twin Update Failed rate limit (critical threshold) | string | `"90"` | no |
| failed\_d2c\_twin\_update\_rate\_threshold\_warning | D2C Twin Update Failed rate limit (warning threshold) | string | `"50"` | no |
| failed\_d2c\_twin\_update\_rate\_time\_aggregator | Monitor aggregator for IoT Hub failed d2c twin update [available values: min, max, sum or avg] | string | `"min"` | no |
| failed\_d2c\_twin\_update\_rate\_timeframe | Monitor timeframe for IoT Hub failed d2c twin update [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| failed\_jobs\_rate\_enabled | Flag to enable IoT Hub failed jobs monitor | string | `"true"` | no |
| failed\_jobs\_rate\_extra\_tags | Extra tags for IoT Hub failed jobs monitor | list | `[]` | no |
| failed\_jobs\_rate\_extra\_tags | Extra tags for IoT Hub failed jobs monitor | list(string) | `[]` | no |
| failed\_jobs\_rate\_message | Custom message for IoT Hub failed jobs monitor | string | `""` | no |
| failed\_jobs\_rate\_threshold\_critical | Jobs Failed rate limit (critical threshold) | string | `"90"` | no |
| failed\_jobs\_rate\_threshold\_warning | Jobs Failed rate limit (warning threshold) | string | `"50"` | no |
| failed\_jobs\_rate\_time\_aggregator | Monitor aggregator for IoT Hub failed jobs [available values: min, max, sum or avg] | string | `"min"` | no |
| failed\_jobs\_rate\_timeframe | Monitor timeframe for IoT Hub failed jobs [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| failed\_listjobs\_rate\_enabled | Flag to enable IoT Hub failed list jobs monitor | string | `"true"` | no |
| failed\_listjobs\_rate\_extra\_tags | Extra tags for IoT Hub failed list jobs monitor | list | `[]` | no |
| failed\_listjobs\_rate\_extra\_tags | Extra tags for IoT Hub failed list jobs monitor | list(string) | `[]` | no |
| failed\_listjobs\_rate\_message | Custom message for IoT Hub failed list jobs monitor | string | `""` | no |
| failed\_listjobs\_rate\_threshold\_critical | ListJobs Failed rate limit (critical threshold) | string | `"90"` | no |
| failed\_listjobs\_rate\_threshold\_warning | ListJobs Failed rate limit (warning threshold) | string | `"50"` | no |
| failed\_listjobs\_rate\_time\_aggregator | Monitor aggregator for IoT Hub failed list jobs [available values: min, max, sum or avg] | string | `"min"` | no |
| failed\_listjobs\_rate\_timeframe | Monitor timeframe for IoT Hub failed list jobs [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| failed\_queryjobs\_rate\_enabled | Flag to enable IoT Hub failed query jobs monitor | string | `"true"` | no |
| failed\_queryjobs\_rate\_extra\_tags | Extra tags for IoT Hub failed query jobs monitor | list | `[]` | no |
| failed\_queryjobs\_rate\_extra\_tags | Extra tags for IoT Hub failed query jobs monitor | list(string) | `[]` | no |
| failed\_queryjobs\_rate\_message | Custom message for IoT Hub failed query jobs monitor | string | `""` | no |
| failed\_queryjobs\_rate\_threshold\_critical | QueryJobs Failed rate limit (critical threshold) | string | `"90"` | no |
| failed\_queryjobs\_rate\_threshold\_warning | QueryJobs Failed rate limit (warning threshold) | string | `"50"` | no |
@ -104,7 +98,7 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| invalid\_d2c\_telemetry\_egress\_enabled | Flag to enable IoT Hub invalid d2c telemetry monitor | string | `"true"` | no |
| invalid\_d2c\_telemetry\_egress\_extra\_tags | Extra tags for IoT Hub invalid d2c telemetry monitor | list | `[]` | no |
| invalid\_d2c\_telemetry\_egress\_extra\_tags | Extra tags for IoT Hub invalid d2c telemetry monitor | list(string) | `[]` | no |
| invalid\_d2c\_telemetry\_egress\_message | Custom message for IoT Hub invalid d2c telemetry monitor | string | `""` | no |
| invalid\_d2c\_telemetry\_egress\_rate\_threshold\_critical | D2C Telemetry Invalid limit (critical threshold) | string | `"90"` | no |
| invalid\_d2c\_telemetry\_egress\_rate\_threshold\_warning | D2C Telemetry Invalid limit (warning threshold) | string | `"50"` | no |
@ -113,7 +107,7 @@ Creates DataDog monitors with the following checks:
| message | Message sent when an alert is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| orphaned\_d2c\_telemetry\_egress\_enabled | Flag to enable IoT Hub orphaned d2c telemetry monitor | string | `"true"` | no |
| orphaned\_d2c\_telemetry\_egress\_extra\_tags | Extra tags for IoT Hub orphaned d2c telemetry monitor | list | `[]` | no |
| orphaned\_d2c\_telemetry\_egress\_extra\_tags | Extra tags for IoT Hub orphaned d2c telemetry monitor | list(string) | `[]` | no |
| orphaned\_d2c\_telemetry\_egress\_message | Custom message for IoT Hub orphaned d2c telemetry monitor | string | `""` | no |
| orphaned\_d2c\_telemetry\_egress\_rate\_threshold\_critical | D2C Telemetry Orphaned limit (critical threshold) | string | `"90"` | no |
| orphaned\_d2c\_telemetry\_egress\_rate\_threshold\_warning | D2C Telemetry Orphaned limit (warning threshold) | string | `"50"` | no |
@ -121,16 +115,16 @@ Creates DataDog monitors with the following checks:
| orphaned\_d2c\_telemetry\_egress\_timeframe | Monitor timeframe for IoT Hub orphaned d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable IoT Hub status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for IoT Hub status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for IoT Hub status monitor | list(string) | `[]` | no |
| status\_message | Custom message for IoT Hub status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for IoT Hub status [available values: min, max, sum or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for IoT Hub status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| too\_many\_d2c\_telemetry\_ingress\_nosent\_enabled | Flag to enable IoT Hub unsent d2c telemetry monitor | string | `"true"` | no |
| too\_many\_d2c\_telemetry\_ingress\_nosent\_extra\_tags | Extra tags for IoT Hub unsent d2c telemetry monitor | list | `[]` | no |
| too\_many\_d2c\_telemetry\_ingress\_nosent\_extra\_tags | Extra tags for IoT Hub unsent d2c telemetry monitor | list(string) | `[]` | no |
| too\_many\_d2c\_telemetry\_ingress\_nosent\_message | Custom message for IoT Hub unsent d2c telemetry monitor | string | `""` | no |
| too\_many\_d2c\_telemetry\_ingress\_nosent\_timeframe | Monitor timeframe for IoT Hub unsent d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| total\_devices\_enabled | Flag to enable IoT Hub total devices monitor | string | `"true"` | no |
| total\_devices\_extra\_tags | Extra tags for IoT Hub total devices monitor | list | `[]` | no |
| total\_devices\_extra\_tags | Extra tags for IoT Hub total devices monitor | list(string) | `[]` | no |
| total\_devices\_message | Custom message for IoT Hub total devices monitor | string | `""` | no |
| total\_devices\_time\_aggregator | Monitor aggregator for IoT Hub total devices [available values: min, max, sum or avg] | string | `"min"` | no |
| total\_devices\_timeframe | Monitor timeframe for IoT Hub total devices [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "too_many_jobs_failed" {
count = var.failed_jobs_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] IOT Hub Too many jobs failed {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.failed_jobs_rate_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.failed_jobs_rate_time_aggregator}(${var.failed_jobs_rate_timeframe}):
@ -68,7 +68,7 @@ resource "datadog_monitor" "too_many_query_jobs_failed" {
count = var.failed_queryjobs_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] IOT Hub Too many query_jobs failed {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.failed_queryjobs_rate_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.failed_queryjobs_rate_time_aggregator}(${var.failed_queryjobs_rate_timeframe}):
@ -126,7 +126,7 @@ resource "datadog_monitor" "total_devices" {
count = var.total_devices_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] IOT Hub Total devices is wrong {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.total_devices_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.total_devices_time_aggregator}(${var.total_devices_timeframe}): (
@ -184,7 +184,7 @@ resource "datadog_monitor" "too_many_c2d_twin_read_failed" {
count = var.failed_c2d_twin_read_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] IOT Hub Too many c2d twin read failure {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.failed_c2d_twin_read_rate_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.failed_c2d_twin_read_rate_time_aggregator}(${var.failed_c2d_twin_read_rate_timeframe}):
@ -200,8 +200,8 @@ warning = var.failed_c2d_twin_read_rate_threshold_warning
critical = var.failed_c2d_twin_read_rate_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
notify_audit = false
@ -250,7 +250,7 @@ resource "datadog_monitor" "too_many_d2c_twin_read_failed" {
count = var.failed_d2c_twin_read_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] IOT Hub Too many d2c twin read failure {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.failed_d2c_twin_read_rate_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.failed_d2c_twin_read_rate_time_aggregator}(${var.failed_d2c_twin_read_rate_timeframe}):
@ -316,7 +316,7 @@ resource "datadog_monitor" "too_many_d2c_telemetry_egress_dropped" {
count = var.dropped_d2c_telemetry_egress_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] IOT Hub Too many d2c telemetry egress dropped {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.dropped_d2c_telemetry_egress_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.dropped_d2c_telemetry_egress_time_aggregator}(${var.dropped_d2c_telemetry_egress_timeframe}):
@ -386,7 +386,7 @@ resource "datadog_monitor" "too_many_d2c_telemetry_egress_invalid" {
count = var.invalid_d2c_telemetry_egress_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] IOT Hub Too many d2c telemetry egress invalid {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.invalid_d2c_telemetry_egress_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.invalid_d2c_telemetry_egress_time_aggregator}(${var.invalid_d2c_telemetry_egress_timeframe}):
@ -420,7 +420,7 @@ EOQ
resource "datadog_monitor" "too_many_d2c_telemetry_ingress_nosent" {
count = var.too_many_d2c_telemetry_ingress_nosent_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] IOT Hub Too many d2c telemetry ingress not sent {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.too_many_d2c_telemetry_ingress_nosent_message,var.message)
message = coalesce(var.too_many_d2c_telemetry_ingress_nosent_message, var.message)
type = "query alert"
query = <<EOQ

View File

@ -16,23 +16,22 @@ module "datadog-monitors-cloud-azure-keyvault" {
Creates DataDog monitors with the following checks:
- Key Vault API latency is high
- Key Vault API result rate is low
- Key Vault is down
- Key Vault API result rate is low
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| api\_latency\_enabled | Flag to enable Key Vault API latency monitor | string | `"true"` | no |
| api\_latency\_extra\_tags | Extra tags for Key Vault API latency monitor | list | `[]` | no |
| api\_latency\_extra\_tags | Extra tags for Key Vault API latency monitor | list(string) | `[]` | no |
| api\_latency\_message | Custom message for Key Vault API latency monitor | string | `""` | no |
| api\_latency\_threshold\_critical | Critical threshold for Key Vault API latency rate | string | `"100"` | no |
| api\_latency\_threshold\_warning | Warning threshold for Key Vault API latency rate | string | `"80"` | no |
| api\_latency\_time\_aggregator | Monitor aggregator for Key Vault API latency [available values: min, max or avg] | string | `"min"` | no |
| api\_latency\_timeframe | Monitor timeframe for Key Vault API latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| api\_result\_enabled | Flag to enable Key Vault API result monitor | string | `"true"` | no |
| api\_result\_extra\_tags | Extra tags for Key Vault API result monitor | list | `[]` | no |
| api\_result\_extra\_tags | Extra tags for Key Vault API result monitor | list(string) | `[]` | no |
| api\_result\_message | Custom message for Key Vault API result monitor | string | `""` | no |
| api\_result\_threshold\_critical | Critical threshold for Key Vault API result rate | string | `"10"` | no |
| api\_result\_threshold\_warning | Warning threshold for Key Vault API result rate | string | `"30"` | no |
@ -47,7 +46,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable Key Vault status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Key Vault status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for Key Vault status monitor | list(string) | `[]` | no |
| status\_message | Custom message for Key Vault status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Key Vault status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Key Vault status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -1,8 +1,8 @@
resource "datadog_monitor" "keyvault_status" {
count = var.status_enabled == "true" ? 1 : 0
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Key Vault is down"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (
@ -57,10 +57,10 @@ tags = concat(["env:${var.environment}", "type:cloud", "provider:azure", "resour
}
resource "datadog_monitor" "keyvault_api_latency" {
count = var.api_latency_enabled == "true" ? 1 : 0
count = var.api_latency_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Key Vault API latency is high {{#is_alert}}{{{comparator}}} {{threshold}}ms ({{value}}ms){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}ms ({{value}}ms){{/is_warning}}"
message = coalesce(var.status_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
${var.api_latency_time_aggregator}(${var.api_latency_timeframe}):
@ -73,8 +73,8 @@ critical = var.api_latency_threshold_critical
warning = var.api_latency_threshold_warning
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
notify_audit = false

View File

@ -31,7 +31,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable Load Balancer status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Load Balancer status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for Load Balancer status monitor | list(string) | `[]` | no |
| status\_message | Custom message for Load Balancer status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Load Balancer status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Load Balancer status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -1,8 +1,8 @@
resource "datadog_monitor" "loadbalancer_status" {
count = var.status_enabled == "true" ? 1 : 0
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Load Balancer is unreachable"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (

View File

@ -17,8 +17,6 @@ module "datadog-monitors-cloud-azure-mysql" {
Creates DataDog monitors with the following checks:
- Mysql Server CPU usage
- Mysql Server IO consumption
- Mysql Server memory usage
- Mysql Server storage
## Inputs
@ -26,7 +24,7 @@ Creates DataDog monitors with the following checks:
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_usage\_enabled | Flag to enable Mysql status monitor | string | `"true"` | no |
| cpu\_usage\_extra\_tags | Extra tags for Mysql status monitor | list | `[]` | no |
| cpu\_usage\_extra\_tags | Extra tags for Mysql status monitor | list(string) | `[]` | no |
| cpu\_usage\_message | Custom message for Mysql CPU monitor | string | `""` | no |
| cpu\_usage\_threshold\_critical | Mysql CPU usage in percent (critical threshold) | string | `"90"` | no |
| cpu\_usage\_threshold\_warning | Mysql CPU usage in percent (warning threshold) | string | `"80"` | no |
@ -37,21 +35,21 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `"*"` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| free\_storage\_enabled | Flag to enable Mysql status monitor | string | `"true"` | no |
| free\_storage\_extra\_tags | Extra tags for Mysql status monitor | list | `[]` | no |
| free\_storage\_extra\_tags | Extra tags for Mysql status monitor | list(string) | `[]` | no |
| free\_storage\_message | Custom message for Mysql Free Storage monitor | string | `""` | no |
| free\_storage\_threshold\_critical | Mysql Free Storage remaining in percent (critical threshold) | string | `"10"` | no |
| free\_storage\_threshold\_warning | Mysql Free Storage remaining in percent (warning threshold) | string | `"20"` | no |
| free\_storage\_time\_aggregator | Monitor aggregator for Mysql Free Storage [available values: min, max or avg] | string | `"min"` | no |
| free\_storage\_timeframe | Monitor timeframe for Mysql Free Storage [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| io\_consumption\_enabled | Flag to enable Mysql status monitor | string | `"true"` | no |
| io\_consumption\_extra\_tags | Extra tags for Mysql status monitor | list | `[]` | no |
| io\_consumption\_extra\_tags | Extra tags for Mysql status monitor | list(string) | `[]` | no |
| io\_consumption\_message | Custom message for Mysql IO consumption monitor | string | `""` | no |
| io\_consumption\_threshold\_critical | Mysql IO consumption in percent (critical threshold) | string | `"90"` | no |
| io\_consumption\_threshold\_warning | Mysql IO consumption in percent (warning threshold) | string | `"80"` | no |
| io\_consumption\_time\_aggregator | Monitor aggregator for Mysql IO consumption [available values: min, max or avg] | string | `"min"` | no |
| io\_consumption\_timeframe | Monitor timeframe for Mysql IO consumption [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| memory\_usage\_enabled | Flag to enable Mysql status monitor | string | `"true"` | no |
| memory\_usage\_extra\_tags | Extra tags for Mysql status monitor | list | `[]` | no |
| memory\_usage\_extra\_tags | Extra tags for Mysql status monitor | list(string) | `[]` | no |
| memory\_usage\_message | Custom message for Mysql memory monitor | string | `""` | no |
| memory\_usage\_threshold\_critical | Mysql memory usage in percent (critical threshold) | string | `"90"` | no |
| memory\_usage\_threshold\_warning | Mysql memory usage in percent (warning threshold) | string | `"80"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "mysql_cpu_usage" {
count = var.cpu_usage_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Mysql Server CPU usage {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cpu_usage_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cpu_usage_time_aggregator}(${var.cpu_usage_timeframe}): (
@ -62,7 +62,7 @@ resource "datadog_monitor" "mysql_io_consumption" {
count = var.io_consumption_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Mysql Server IO consumption {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.io_consumption_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.io_consumption_time_aggregator}(${var.io_consumption_timeframe}): (

View File

@ -17,17 +17,15 @@ module "datadog-monitors-cloud-azure-postgresql" {
Creates DataDog monitors with the following checks:
- Postgresql Server CPU usage
- Postgresql Server has no connection
- Postgresql Server IO consumption
- Postgresql Server memory usage
- Postgresql Server storage
- Postgresql Server has no connection
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_usage\_enabled | Flag to enable PostgreSQL status monitor | string | `"true"` | no |
| cpu\_usage\_extra\_tags | Extra tags for PostgreSQL status monitor | list | `[]` | no |
| cpu\_usage\_extra\_tags | Extra tags for PostgreSQL status monitor | list(string) | `[]` | no |
| cpu\_usage\_message | Custom message for PostgreSQL CPU monitor | string | `""` | no |
| cpu\_usage\_threshold\_critical | PostgreSQL CPU usage in percent (critical threshold) | string | `"90"` | no |
| cpu\_usage\_threshold\_warning | PostgreSQL CPU usage in percent (warning threshold) | string | `"80"` | no |
@ -38,21 +36,21 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `"*"` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| free\_storage\_enabled | Flag to enable PostgreSQL status monitor | string | `"true"` | no |
| free\_storage\_extra\_tags | Extra tags for PostgreSQL status monitor | list | `[]` | no |
| free\_storage\_extra\_tags | Extra tags for PostgreSQL status monitor | list(string) | `[]` | no |
| free\_storage\_message | Custom message for PostgreSQL Free Storage monitor | string | `""` | no |
| free\_storage\_threshold\_critical | PostgreSQL Free Storage remaining in percent (critical threshold) | string | `"10"` | no |
| free\_storage\_threshold\_warning | PostgreSQL Free Storage remaining in percent (warning threshold) | string | `"20"` | no |
| free\_storage\_time\_aggregator | Monitor aggregator for PostgreSQL Free Storage [available values: min, max or avg] | string | `"min"` | no |
| free\_storage\_timeframe | Monitor timeframe for PostgreSQL Free Storage [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| io\_consumption\_enabled | Flag to enable PostgreSQL status monitor | string | `"true"` | no |
| io\_consumption\_extra\_tags | Extra tags for PostgreSQL status monitor | list | `[]` | no |
| io\_consumption\_extra\_tags | Extra tags for PostgreSQL status monitor | list(string) | `[]` | no |
| io\_consumption\_message | Custom message for PostgreSQL IO consumption monitor | string | `""` | no |
| io\_consumption\_threshold\_critical | PostgreSQL IO consumption in percent (critical threshold) | string | `"90"` | no |
| io\_consumption\_threshold\_warning | PostgreSQL IO consumption in percent (warning threshold) | string | `"80"` | no |
| io\_consumption\_time\_aggregator | Monitor aggregator for PostgreSQL IO consumption [available values: min, max or avg] | string | `"min"` | no |
| io\_consumption\_timeframe | Monitor timeframe for PostgreSQL IO consumption [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| memory\_usage\_enabled | Flag to enable PostgreSQL status monitor | string | `"true"` | no |
| memory\_usage\_extra\_tags | Extra tags for PostgreSQL status monitor | list | `[]` | no |
| memory\_usage\_extra\_tags | Extra tags for PostgreSQL status monitor | list(string) | `[]` | no |
| memory\_usage\_message | Custom message for PostgreSQL memory monitor | string | `""` | no |
| memory\_usage\_threshold\_critical | PostgreSQL memory usage in percent (critical threshold) | string | `"90"` | no |
| memory\_usage\_threshold\_warning | PostgreSQL memory usage in percent (warning threshold) | string | `"80"` | no |
@ -61,7 +59,7 @@ Creates DataDog monitors with the following checks:
| message | Message sent when an alert is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| no\_connection\_enabled | Flag to enable PostgreSQL status monitor | string | `"true"` | no |
| no\_connection\_extra\_tags | Extra tags for PostgreSQL status monitor | list | `[]` | no |
| no\_connection\_extra\_tags | Extra tags for PostgreSQL status monitor | list(string) | `[]` | no |
| no\_connection\_message | Custom message for PostgreSQL no connection monitor | string | `""` | no |
| no\_connection\_time\_aggregator | Monitor aggregator for PostgreSQL no connection [available values: min, max or avg] | string | `"min"` | no |
| no\_connection\_timeframe | Monitor timeframe for PostgreSQL no connection [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "postgresql_cpu_usage" {
count = var.cpu_usage_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Postgresql Server CPU usage {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cpu_usage_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cpu_usage_time_aggregator}(${var.cpu_usage_timeframe}): (
@ -57,7 +57,7 @@ resource "datadog_monitor" "postgresql_free_storage" {
count = var.free_storage_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Postgresql Server storage {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.free_storage_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.free_storage_time_aggregator}(${var.free_storage_timeframe}): (
@ -117,7 +117,7 @@ resource "datadog_monitor" "postgresql_memory_usage" {
count = var.memory_usage_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Postgresql Server memory usage {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.memory_usage_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.memory_usage_time_aggregator}(${var.memory_usage_timeframe}): (

View File

@ -17,8 +17,6 @@ module "datadog-monitors-cloud-azure-redis" {
Creates DataDog monitors with the following checks:
- Redis {{name}} is down
- Redis processor time too high
- Redis server load too high
- Redis too many evictedkeys
## Inputs
@ -28,7 +26,7 @@ Creates DataDog monitors with the following checks:
| environment | Architecture environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| evictedkeys\_limit\_enabled | Flag to enable Redis evicted keys monitor | string | `"true"` | no |
| evictedkeys\_limit\_extra\_tags | Extra tags for Redis evicted keys monitor | list | `[]` | no |
| evictedkeys\_limit\_extra\_tags | Extra tags for Redis evicted keys monitor | list(string) | `[]` | no |
| evictedkeys\_limit\_message | Custom message for Redis evicted keys monitor | string | `""` | no |
| evictedkeys\_limit\_threshold\_critical | Evicted keys limit (critical threshold) | string | `"100"` | no |
| evictedkeys\_limit\_threshold\_warning | Evicted keys limit (warning threshold) | string | `"0"` | no |
@ -40,7 +38,7 @@ Creates DataDog monitors with the following checks:
| message | Message sent when a Redis monitor is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| percent\_processor\_time\_enabled | Flag to enable Redis processor monitor | string | `"true"` | no |
| percent\_processor\_time\_extra\_tags | Extra tags for Redis processor monitor | list | `[]` | no |
| percent\_processor\_time\_extra\_tags | Extra tags for Redis processor monitor | list(string) | `[]` | no |
| percent\_processor\_time\_message | Custom message for Redis processor monitor | string | `""` | no |
| percent\_processor\_time\_threshold\_critical | Processor time percent (critical threshold) | string | `"80"` | no |
| percent\_processor\_time\_threshold\_warning | Processor time percent (warning threshold) | string | `"60"` | no |
@ -48,14 +46,14 @@ Creates DataDog monitors with the following checks:
| percent\_processor\_time\_timeframe | Monitor timeframe for Redis processor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| server\_load\_rate\_enabled | Flag to enable Redis server load monitor | string | `"true"` | no |
| server\_load\_rate\_extra\_tags | Extra tags for Redis server load monitor | list | `[]` | no |
| server\_load\_rate\_extra\_tags | Extra tags for Redis server load monitor | list(string) | `[]` | no |
| server\_load\_rate\_message | Custom message for Redis server load monitor | string | `""` | no |
| server\_load\_rate\_threshold\_critical | Server CPU load rate (critical threshold) | string | `"90"` | no |
| server\_load\_rate\_threshold\_warning | Server CPU load rate (warning threshold) | string | `"70"` | no |
| server\_load\_rate\_time\_aggregator | Monitor aggregator for Redis server load [available values: min, max or avg] | string | `"min"` | no |
| server\_load\_rate\_timeframe | Monitor timeframe for Redis server load [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| status\_enabled | Flag to enable Redis status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Redis status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for Redis status monitor | list(string) | `[]` | no |
| status\_message | Custom message for Redis status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Redis status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Redis status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "status" {
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Redis {{name}} is down"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (
@ -57,7 +57,7 @@ resource "datadog_monitor" "percent_processor_time" {
count = var.percent_processor_time_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Redis processor time too high {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.percent_processor_time_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.percent_processor_time_time_aggregator}(${var.percent_processor_time_timeframe}): (

View File

@ -16,16 +16,15 @@ module "datadog-monitors-cloud-azure-serverfarms" {
Creates DataDog monitors with the following checks:
- Serverfarm CPU percentage is too high
- Serverfarm is down
- Serverfarm memory percentage is too high
- Serverfarm CPU percentage is too high
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_percentage\_enabled | Flag to enable the serverfarms cpu_percentage monitor | string | `"true"` | no |
| cpu\_percentage\_extra\_tags | Extra tags for serverfarms cpu_percentage monitor | list | `[]` | no |
| cpu\_percentage\_extra\_tags | Extra tags for serverfarms cpu_percentage monitor | list(string) | `[]` | no |
| cpu\_percentage\_message | Custom message for serverfarm cpu_percentage monitor | string | `""` | no |
| cpu\_percentage\_threshold\_critical | CPU percentage (critical threshold) | string | `"95"` | no |
| cpu\_percentage\_threshold\_warning | CPU percentage (warning threshold) | string | `"90"` | no |
@ -36,7 +35,7 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `"*"` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| memory\_percentage\_enabled | Flag to enable the serverfarms memory_percentage monitor | string | `"true"` | no |
| memory\_percentage\_extra\_tags | Extra tags for serverfarms memory_percentage monitor | list | `[]` | no |
| memory\_percentage\_extra\_tags | Extra tags for serverfarms memory_percentage monitor | list(string) | `[]` | no |
| memory\_percentage\_message | Custom message for serverfarm memory_percentage monitor | string | `""` | no |
| memory\_percentage\_threshold\_critical | Memory percentage (critical threshold) | string | `"95"` | no |
| memory\_percentage\_threshold\_warning | Memory percentage (warning threshold) | string | `"90"` | no |
@ -46,7 +45,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable the serverfarms status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for serverfarms status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for serverfarms status monitor | list(string) | `[]` | no |
| status\_message | Custom message for serverfarm status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for serverfarms status [available values: min, max or avg] | string | `"min"` | no |
| status\_timeframe | Monitor timeframe for serverfarms status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "status" {
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Serverfarm is down"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (
@ -57,7 +57,7 @@ resource "datadog_monitor" "memory_percentage" {
count = var.memory_percentage_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Serverfarm memory percentage is too high {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.memory_percentage_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.memory_percentage_time_aggregator}(${var.memory_percentage_timeframe}): (

View File

@ -16,10 +16,8 @@ module "datadog-monitors-cloud-azure-servicebus" {
Creates DataDog monitors with the following checks:
- Service Bus has no active connection
- Service Bus is down
- Service Bus server errors rate is high
- Service Bus user errors rate is high
- Service Bus has no active connection
## Inputs
@ -32,23 +30,26 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| message | Message sent when an alert is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| no\_active\_connections\_enabled | Flag to enable Service Bus status monitor | string | `"true"` | no |
| no\_active\_connections\_message | Custom message for Service Bus status monitor | string | `""` | no |
| no\_active\_connections\_time\_aggregator | Monitor aggregator for Service Bus status [available values: min, max or avg] | string | `"max"` | no |
| no\_active\_connections\_timeframe | Monitor timeframe for Service Bus status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| no\_active\_connections\_enabled | Flag to enable Service Bus no active connections monitor | string | `"true"` | no |
| no\_active\_connections\_extra\_tags | Extra tags for Service Bus no active connections monitor | list(string) | `[]` | no |
| no\_active\_connections\_message | Custom message for Service Bus no active connections monitor | string | `""` | no |
| no\_active\_connections\_time\_aggregator | Monitor aggregator for Service Bus no active connections [available values: min, max or avg] | string | `"max"` | no |
| no\_active\_connections\_timeframe | Monitor timeframe for Service Bus no active connections [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| server\_errors\_enabled | Flag to enable Service Bus server errors monitor | string | `"true"` | no |
| server\_errors\_extra\_tags | Extra tags for Service Bus server errors monitor | list(string) | `[]` | no |
| server\_errors\_message | Custom message for Service Bus server errors monitor | string | `""` | no |
| server\_errors\_threshold\_critical | Critical threshold for Service Bus server errors monitor | string | `"90"` | no |
| server\_errors\_threshold\_warning | Warning threshold for Service Bus server errors monitor | string | `"50"` | no |
| server\_errors\_time\_aggregator | Monitor aggregator for Service Bus server errors [available values: min, max or avg] | string | `"min"` | no |
| server\_errors\_timeframe | Monitor timeframe for Service Bus server errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| status\_enabled | Flag to enable Service Bus status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Service Bus status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for Service Bus status monitor | list(string) | `[]` | no |
| status\_message | Custom message for Service Bus status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Service Bus status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Service Bus status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| user\_errors\_enabled | Flag to enable Service Bus user errors monitor | string | `"true"` | no |
| user\_errors\_extra\_tags | Extra tags for Service Bus user errors monitor | list(string) | `[]` | no |
| user\_errors\_message | Custom message for Service Bus user errors monitor | string | `""` | no |
| user\_errors\_threshold\_critical | Critical threshold for Service Bus user errors monitor | string | `"90"` | no |
| user\_errors\_threshold\_warning | Warning threshold for Service Bus user errors monitor | string | `"50"` | no |

View File

@ -1,8 +1,8 @@
resource "datadog_monitor" "servicebus_status" {
count = var.status_enabled == "true" ? 1 : 0
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Service Bus is down"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (
@ -49,10 +49,10 @@ tags = concat(["env:${var.environment}", "resource:servicebus", "team:azure", "p
}
resource "datadog_monitor" "service_bus_user_errors" {
count = var.user_errors_enabled == "true" ? 1 : 0
count = var.user_errors_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Service Bus user errors rate is high {{#is_alert}}{{comparator}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{comparator}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.user_errors_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.user_errors_time_aggregator}(${var.user_errors_timeframe}): (

View File

@ -16,37 +16,35 @@ module "datadog-monitors-cloud-azure-sql-database" {
Creates DataDog monitors with the following checks:
- SQL Database CPU too high
- SQL Database Deadlocks too high
- SQL Database DTU Consumption too high
- SQL Database high disk usage
- SQL Database is down
- SQL Database CPU too high
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_enabled | Flag to enable SQL CPU monitor | string | `"true"` | no |
| cpu\_extra\_tags | Extra tags for SQL CPU monitor | list | `[]` | no |
| cpu\_extra\_tags | Extra tags for SQL CPU monitor | list(string) | `[]` | no |
| cpu\_message | Custom message for SQL CPU monitor | string | `""` | no |
| cpu\_threshold\_critical | CPU usage in percent (critical threshold) | string | `"90"` | no |
| cpu\_threshold\_warning | CPU usage in percent (warning threshold) | string | `"80"` | no |
| cpu\_time\_aggregator | Monitor aggregator for SQL CPU [available values: min, max or avg] | string | `"min"` | no |
| cpu\_timeframe | Monitor timeframe for SQL CPU [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| deadlock\_enabled | Flag to enable SQL Deadlock monitor | string | `"true"` | no |
| deadlock\_extra\_tags | Extra tags for SQL Deadlock monitor | list | `[]` | no |
| deadlock\_extra\_tags | Extra tags for SQL Deadlock monitor | list(string) | `[]` | no |
| deadlock\_message | Custom message for SQL Deadlock monitor | string | `""` | no |
| deadlock\_threshold\_critical | Amount of Deadlocks (critical threshold) | string | `"1"` | no |
| deadlock\_timeframe | Monitor timeframe for SQL Deadlock [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| diskspace\_enabled | Flag to enable SQL disk space monitor | string | `"true"` | no |
| diskspace\_extra\_tags | Extra tags for SQL disk space monitor | list | `[]` | no |
| diskspace\_extra\_tags | Extra tags for SQL disk space monitor | list(string) | `[]` | no |
| diskspace\_message | Custom message for SQL disk space monitor | string | `""` | no |
| diskspace\_threshold\_critical | Disk space used in percent (critical threshold) | string | `"90"` | no |
| diskspace\_threshold\_warning | Disk space used in percent (warning threshold) | string | `"80"` | no |
| diskspace\_time\_aggregator | Monitor aggregator for SQL disk space [available values: min, max or avg] | string | `"max"` | no |
| diskspace\_timeframe | Monitor timeframe for SQL disk space [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| dtu\_enabled | Flag to enable SQL DTU monitor | string | `"true"` | no |
| dtu\_extra\_tags | Extra tags for SQL DTU monitor | list | `[]` | no |
| dtu\_extra\_tags | Extra tags for SQL DTU monitor | list(string) | `[]` | no |
| dtu\_message | Custom message for SQL DTU monitor | string | `""` | no |
| dtu\_threshold\_critical | Amount of DTU used (critical threshold) | string | `"90"` | no |
| dtu\_threshold\_warning | Amount of DTU used (warning threshold) | string | `"85"` | no |
@ -61,7 +59,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable Redis status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Redis status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for Redis status monitor | list(string) | `[]` | no |
| status\_message | Custom message for Redis status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Redis status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Redis status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "status" {
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] SQL Database is down"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (
@ -57,7 +57,7 @@ resource "datadog_monitor" "sql-database_free_space_low" {
count = var.diskspace_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] SQL Database high disk usage {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.diskspace_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.diskspace_time_aggregator}(${var.diskspace_timeframe}): (
@ -117,7 +117,7 @@ resource "datadog_monitor" "sql-database_deadlocks_count" {
count = var.deadlock_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] SQL Database Deadlocks too high {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.deadlock_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
sum(${var.deadlock_timeframe}): (

View File

@ -17,7 +17,6 @@ module "datadog-monitors-cloud-azure-sql-elasticpool" {
Creates DataDog monitors with the following checks:
- SQL Elastic Pool CPU too high
- SQL Elastic Pool DTU Consumption too high
- SQL Elastic Pool high disk usage
## Inputs
@ -25,21 +24,21 @@ Creates DataDog monitors with the following checks:
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_enabled | Flag to enable SQL Elastic Pool CPU monitor | string | `"true"` | no |
| cpu\_extra\_tags | Extra tags for SQL Elastic Pool CPU monitor | list | `[]` | no |
| cpu\_extra\_tags | Extra tags for SQL Elastic Pool CPU monitor | list(string) | `[]` | no |
| cpu\_message | Custom message for SQL Elastic Pool CPU monitor | string | `""` | no |
| cpu\_threshold\_critical | CPU usage in percent (critical threshold) | string | `"90"` | no |
| cpu\_threshold\_warning | CPU usage in percent (warning threshold) | string | `"80"` | no |
| cpu\_time\_aggregator | Monitor aggregator for SQL Elastic Pool CPU [available values: min, max or avg] | string | `"min"` | no |
| cpu\_timeframe | Monitor timeframe for SQL Elastic Pool CPU [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| diskspace\_enabled | Flag to enable SQL Elastic Pool disk space monitor | string | `"true"` | no |
| diskspace\_extra\_tags | Extra tags for SQL Elastic Pool disk space monitor | list | `[]` | no |
| diskspace\_extra\_tags | Extra tags for SQL Elastic Pool disk space monitor | list(string) | `[]` | no |
| diskspace\_message | Custom message for SQL Elastic Pool disk space monitor | string | `""` | no |
| diskspace\_threshold\_critical | Disk space used in percent (critical threshold) | string | `"90"` | no |
| diskspace\_threshold\_warning | Disk space used in percent (warning threshold) | string | `"80"` | no |
| diskspace\_time\_aggregator | Monitor aggregator for SQL Elastic Pool disk space [available values: min, max or avg] | string | `"max"` | no |
| diskspace\_timeframe | Monitor timeframe for SQL Elastic Pool disk space [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| dtu\_enabled | Flag to enable SQL Elastic Pool DTU monitor | string | `"true"` | no |
| dtu\_extra\_tags | Extra tags for SQL Elastic Pool DTU monitor | list | `[]` | no |
| dtu\_extra\_tags | Extra tags for SQL Elastic Pool DTU monitor | list(string) | `[]` | no |
| dtu\_message | Custom message for SQL Elastic Pool DTU monitor | string | `""` | no |
| dtu\_threshold\_critical | Amount of DTU used (critical threshold) | string | `"90"` | no |
| dtu\_threshold\_warning | Amount of DTU used (warning threshold) | string | `"85"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "sql_elasticpool_cpu" {
count = var.cpu_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] SQL Elastic Pool CPU too high {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cpu_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cpu_time_aggregator}(${var.cpu_timeframe}): (
@ -62,7 +62,7 @@ resource "datadog_monitor" "sql_elasticpool_dtu_consumption_high" {
count = var.dtu_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] SQL Elastic Pool DTU Consumption too high {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.dtu_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.dtu_time_aggregator}(${var.dtu_timeframe}): (

View File

@ -17,35 +17,31 @@ module "datadog-monitors-cloud-azure-storage" {
Creates DataDog monitors with the following checks:
- Azure Storage is down
- Azure Storage too few successful requests
- Azure Storage too high end to end latency
- Azure Storage too many authorization errors
- Azure Storage too many client_other errors
- Azure Storage too many network errors
- Azure Storage too many server_other errors
- Azure Storage too few successful requests
- Azure Storage too many throttling errors
- Azure Storage too many timeout errors
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| authorization\_error\_requests\_enabled | Flag to enable Storage authorization errors monitor | string | `"true"` | no |
| authorization\_error\_requests\_extra\_tags | Extra tags for Storage authorization errors monitor | list | `[]` | no |
| authorization\_error\_requests\_extra\_tags | Extra tags for Storage authorization errors monitor | list(string) | `[]` | no |
| authorization\_error\_requests\_message | Custom message for Storage authorization errors monitor | string | `""` | no |
| authorization\_error\_requests\_threshold\_critical | Maximum acceptable percent of authorization error requests for a storage | string | `"90"` | no |
| authorization\_error\_requests\_threshold\_warning | Warning regarding acceptable percent of authorization error requests for a storage | string | `"50"` | no |
| authorization\_error\_requests\_time\_aggregator | Monitor aggregator for Storage authorization errors [available values: min, max or avg] | string | `"min"` | no |
| authorization\_error\_requests\_timeframe | Monitor timeframe for Storage authorization errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| availability\_enabled | Flag to enable Storage availability monitor | string | `"true"` | no |
| availability\_extra\_tags | Extra tags for Storage availability monitor | list | `[]` | no |
| availability\_extra\_tags | Extra tags for Storage availability monitor | list(string) | `[]` | no |
| availability\_message | Custom message for Storage availability monitor | string | `""` | no |
| availability\_threshold\_critical | Minimum acceptable percent of availability for a storage | string | `"50"` | no |
| availability\_threshold\_warning | Warning regarding acceptable percent of availability for a storage | string | `"90"` | no |
| availability\_time\_aggregator | Monitor aggregator for Storage availability [available values: min, max or avg] | string | `"max"` | no |
| availability\_timeframe | Monitor timeframe for Storage availability [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| client\_other\_error\_requests\_enabled | Flag to enable Storage other errors monitor | string | `"true"` | no |
| client\_other\_error\_requests\_extra\_tags | Extra tags for Storage other errors monitor | list | `[]` | no |
| client\_other\_error\_requests\_extra\_tags | Extra tags for Storage other errors monitor | list(string) | `[]` | no |
| client\_other\_error\_requests\_message | Custom message for Storage other errors monitor | string | `""` | no |
| client\_other\_error\_requests\_threshold\_critical | Maximum acceptable percent of client other error requests for a storage | string | `"90"` | no |
| client\_other\_error\_requests\_threshold\_warning | Warning regarding acceptable percent of client other error requests for a storage | string | `"50"` | no |
@ -57,7 +53,7 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| latency\_enabled | Flag to enable Storage latency monitor | string | `"true"` | no |
| latency\_extra\_tags | Extra tags for Storage latency monitor | list | `[]` | no |
| latency\_extra\_tags | Extra tags for Storage latency monitor | list(string) | `[]` | no |
| latency\_message | Custom message for Storage latency monitor | string | `""` | no |
| latency\_threshold\_critical | Maximum acceptable end to end latency (ms) for a storage | string | `"2000"` | no |
| latency\_threshold\_warning | Warning regarding acceptable end to end latency (ms) for a storage | string | `"1000"` | no |
@ -65,7 +61,7 @@ Creates DataDog monitors with the following checks:
| latency\_timeframe | Monitor timeframe for Storage latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| message | Message sent when a Redis monitor is triggered | string | n/a | yes |
| network\_error\_requests\_enabled | Flag to enable Storage network errors monitor | string | `"true"` | no |
| network\_error\_requests\_extra\_tags | Extra tags for Storage network errors monitor | list | `[]` | no |
| network\_error\_requests\_extra\_tags | Extra tags for Storage network errors monitor | list(string) | `[]` | no |
| network\_error\_requests\_message | Custom message for Storage network errors monitor | string | `""` | no |
| network\_error\_requests\_threshold\_critical | Maximum acceptable percent of network error requests for a storage | string | `"90"` | no |
| network\_error\_requests\_threshold\_warning | Warning regarding acceptable percent of network error requests for a storage | string | `"50"` | no |
@ -74,28 +70,28 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| server\_other\_error\_requests\_enabled | Flag to enable Storage server other errors monitor | string | `"true"` | no |
| server\_other\_error\_requests\_extra\_tags | Extra tags for Storage server other errors monitor | list | `[]` | no |
| server\_other\_error\_requests\_extra\_tags | Extra tags for Storage server other errors monitor | list(string) | `[]` | no |
| server\_other\_error\_requests\_message | Custom message for Storage server other errors monitor | string | `""` | no |
| server\_other\_error\_requests\_threshold\_critical | Maximum acceptable percent of server other error requests for a storage | string | `"90"` | no |
| server\_other\_error\_requests\_threshold\_warning | Warning regarding acceptable percent of server other error requests for a storage | string | `"50"` | no |
| server\_other\_error\_requests\_time\_aggregator | Monitor aggregator for Storage other errors [available values: min, max or avg] | string | `"min"` | no |
| server\_other\_error\_requests\_timeframe | Monitor timeframe for Storage server other errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| successful\_requests\_enabled | Flag to enable Storage sucessful requests monitor | string | `"true"` | no |
| successful\_requests\_extra\_tags | Extra tags for Storage sucessful requests monitor | list | `[]` | no |
| successful\_requests\_extra\_tags | Extra tags for Storage sucessful requests monitor | list(string) | `[]` | no |
| successful\_requests\_message | Custom message for Storage sucessful requests monitor | string | `""` | no |
| successful\_requests\_threshold\_critical | Minimum acceptable percent of successful requests for a storage | string | `"10"` | no |
| successful\_requests\_threshold\_warning | Warning regarding acceptable percent of successful requests for a storage | string | `"30"` | no |
| successful\_requests\_time\_aggregator | Monitor aggregator for Storage sucessful requests [available values: min, max or avg] | string | `"max"` | no |
| successful\_requests\_timeframe | Monitor timeframe for Storage sucessful requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| throttling\_error\_requests\_enabled | Flag to enable Storage throttling error monitor | string | `"true"` | no |
| throttling\_error\_requests\_extra\_tags | Extra tags for Storage throttling error monitor | list | `[]` | no |
| throttling\_error\_requests\_extra\_tags | Extra tags for Storage throttling error monitor | list(string) | `[]` | no |
| throttling\_error\_requests\_message | Custom message for Storage throttling error monitor | string | `""` | no |
| throttling\_error\_requests\_threshold\_critical | Maximum acceptable percent of throttling error requests for a storage | string | `"90"` | no |
| throttling\_error\_requests\_threshold\_warning | Warning regarding acceptable percent of throttling error requests for a storage | string | `"50"` | no |
| throttling\_error\_requests\_time\_aggregator | Monitor aggregator for Storage throttling errors [available values: min, max or avg] | string | `"min"` | no |
| throttling\_error\_requests\_timeframe | Monitor timeframe for Storage throttling errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| timeout\_error\_requests\_enabled | Flag to enable Storage timeout monitor | string | `"true"` | no |
| timeout\_error\_requests\_extra\_tags | Extra tags for Storage timeout monitor | list | `[]` | no |
| timeout\_error\_requests\_extra\_tags | Extra tags for Storage timeout monitor | list(string) | `[]` | no |
| timeout\_error\_requests\_message | Custom message for Storage timeout monitor | string | `""` | no |
| timeout\_error\_requests\_threshold\_critical | Maximum acceptable percent of timeout error requests for a storage | string | `"90"` | no |
| timeout\_error\_requests\_threshold\_warning | Warning regarding acceptable percent of timeout error requests for a storage | string | `"50"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "availability" {
count = var.availability_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Storage is down"
message = coalesce(var.availability_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.availability_time_aggregator}(${var.availability_timeframe}): (default(
@ -32,7 +32,7 @@ resource "datadog_monitor" "successful_requests" {
count = var.successful_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Storage too few successful requests {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.successful_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.successful_requests_time_aggregator}(${var.successful_requests_timeframe}): (default(
@ -62,7 +62,7 @@ resource "datadog_monitor" "latency" {
count = var.latency_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Storage too high end to end latency {{#is_alert}}{{{comparator}}} {{threshold}}ms ({{value}}ms){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}ms ({{value}}ms){{/is_warning}}"
message = coalesce(var.latency_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.latency_time_aggregator}(${var.latency_timeframe}): (default(
@ -92,7 +92,7 @@ resource "datadog_monitor" "timeout_error_requests" {
count = var.timeout_error_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Storage too many timeout errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.timeout_error_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.timeout_error_requests_time_aggregator}(${var.timeout_error_requests_timeframe}): (default(
@ -122,7 +122,7 @@ resource "datadog_monitor" "network_error_requests" {
count = var.network_error_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Storage too many network errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.network_error_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.network_error_requests_time_aggregator}(${var.network_error_requests_timeframe}): (default(
@ -152,7 +152,7 @@ resource "datadog_monitor" "throttling_error_requests" {
count = var.throttling_error_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Storage too many throttling errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.throttling_error_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.throttling_error_requests_time_aggregator}(${var.throttling_error_requests_timeframe}): (default(
@ -182,7 +182,7 @@ resource "datadog_monitor" "server_other_error_requests" {
count = var.server_other_error_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Storage too many server_other errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.server_other_error_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.server_other_error_requests_time_aggregator}(${var.server_other_error_requests_timeframe}): (default(
@ -212,7 +212,7 @@ resource "datadog_monitor" "client_other_error_requests" {
count = var.client_other_error_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Storage too many client_other errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.client_other_error_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.client_other_error_requests_time_aggregator}(${var.client_other_error_requests_timeframe}): (default(
@ -242,7 +242,7 @@ resource "datadog_monitor" "authorization_error_requests" {
count = var.authorization_error_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Azure Storage too many authorization errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.authorization_error_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.authorization_error_requests_time_aggregator}(${var.authorization_error_requests_timeframe}): (default(

View File

@ -17,17 +17,15 @@ module "datadog-monitors-cloud-azure-stream-analytics" {
Creates DataDog monitors with the following checks:
- Stream Analytics is down
- Stream Analytics streaming units utilization too high
- Stream Analytics too many conversion errors
- Stream Analytics too many failed requests
- Stream Analytics too many runtime errors
- Stream Analytics streaming units utilization too high
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| conversion\_errors\_enabled | Flag to enable Stream Analytics conversion errors monitor | string | `"true"` | no |
| conversion\_errors\_extra\_tags | Extra tags for Stream Analytics conversion errors monitor | list | `[]` | no |
| conversion\_errors\_extra\_tags | Extra tags for Stream Analytics conversion errors monitor | list(string) | `[]` | no |
| conversion\_errors\_message | Custom message for Stream Analytics conversion errors monitor | string | `""` | no |
| conversion\_errors\_threshold\_critical | Conversion errors limit (critical threshold) | string | `"10"` | no |
| conversion\_errors\_threshold\_warning | Conversion errors limit (warning threshold) | string | `"0"` | no |
@ -36,7 +34,7 @@ Creates DataDog monitors with the following checks:
| environment | Architecture environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| failed\_function\_requests\_enabled | Flag to enable Stream Analytics failed requests monitor | string | `"true"` | no |
| failed\_function\_requests\_extra\_tags | Extra tags for Stream Analytics failed requests monitor | list | `[]` | no |
| failed\_function\_requests\_extra\_tags | Extra tags for Stream Analytics failed requests monitor | list(string) | `[]` | no |
| failed\_function\_requests\_message | Custom message for Stream Analytics failed requests monitor | string | `""` | no |
| failed\_function\_requests\_threshold\_critical | Failed Function Request rate limit (critical threshold) | string | `"10"` | no |
| failed\_function\_requests\_threshold\_warning | Failed Function Request rate limit (warning threshold) | string | `"0"` | no |
@ -49,19 +47,19 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| runtime\_errors\_enabled | Flag to enable Stream Analytics runtime errors monitor | string | `"true"` | no |
| runtime\_errors\_extra\_tags | Extra tags for Stream Analytics runtime errors monitor | list | `[]` | no |
| runtime\_errors\_extra\_tags | Extra tags for Stream Analytics runtime errors monitor | list(string) | `[]` | no |
| runtime\_errors\_message | Custom message for Stream Analytics runtime errors monitor | string | `""` | no |
| runtime\_errors\_threshold\_critical | Runtime errors limit (critical threshold) | string | `"10"` | no |
| runtime\_errors\_threshold\_warning | Runtime errors limit (warning threshold) | string | `"0"` | no |
| runtime\_errors\_time\_aggregator | Monitor aggregator for Stream Analytics runtime errors [available values: min, max or avg] | string | `"min"` | no |
| runtime\_errors\_timeframe | Monitor timeframe for Stream Analytics runtime errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| status\_enabled | Flag to enable Stream Analytics status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Stream Analytics status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for Stream Analytics status monitor | list(string) | `[]` | no |
| status\_message | Custom message for Stream Analytics status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Stream Analytics status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Stream Analytics status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| su\_utilization\_enabled | Flag to enable Stream Analytics utilization monitor | string | `"true"` | no |
| su\_utilization\_extra\_tags | Extra tags for Stream Analytics utilization monitor | list | `[]` | no |
| su\_utilization\_extra\_tags | Extra tags for Stream Analytics utilization monitor | list(string) | `[]` | no |
| su\_utilization\_message | Custom message for Stream Analytics utilization monitor | string | `""` | no |
| su\_utilization\_threshold\_critical | Streaming Unit utilization rate limit (critical threshold) | string | `"80"` | no |
| su\_utilization\_threshold\_warning | Streaming Unit utilization rate limit (warning threshold) | string | `"60"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "status" {
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Stream Analytics is down"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (
@ -57,7 +57,7 @@ resource "datadog_monitor" "failed_function_requests" {
count = var.failed_function_requests_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Stream Analytics too many failed requests {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.failed_function_requests_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.failed_function_requests_time_aggregator}(${var.failed_function_requests_timeframe}): (
@ -69,6 +69,7 @@ EOQ
thresholds = {
warning = var.failed_function_requests_threshold_warning
critical = var.failed_function_requests_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
@ -117,7 +118,7 @@ resource "datadog_monitor" "runtime_errors" {
count = var.runtime_errors_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Stream Analytics too many runtime errors {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.runtime_errors_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.runtime_errors_time_aggregator}(${var.runtime_errors_timeframe}): (

View File

@ -16,23 +16,22 @@ module "datadog-monitors-cloud-azure-virtual-machine" {
Creates DataDog monitors with the following checks:
- Virtual Machine CPU usage
- Virtual Machine credit CPU
- Virtual Machine is unreachable
- Virtual Machine CPU usage
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_remaining\_rate\_enabled | Flag to enable Virtual Machine CPU remaining monitor | string | `"true"` | no |
| cpu\_remaining\_rate\_extra\_tags | Extra tags for Virtual Machine CPU remaining monitor | list | `[]` | no |
| cpu\_remaining\_rate\_extra\_tags | Extra tags for Virtual Machine CPU remaining monitor | list(string) | `[]` | no |
| cpu\_remaining\_rate\_message | Custom message for Virtual Machine CPU remaining monitor | string | `""` | no |
| cpu\_remaining\_rate\_threshold\_critical | Virtual Machine CPU rate limit (critical threshold) | string | `"15"` | no |
| cpu\_remaining\_rate\_threshold\_warning | Virtual Machine CPU rate limit (warning threshold) | string | `"30"` | no |
| cpu\_remaining\_rate\_time\_aggregator | Monitor aggregator for Virtual Machine CPU remaining [available values: min, max, sum or avg] | string | `"min"` | no |
| cpu\_remaining\_rate\_timeframe | Monitor timeframe for Virtual Machine CPU remaining [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| cpu\_usage\_enabled | Flag to enable Virtual Machine status monitor | string | `"true"` | no |
| cpu\_usage\_extra\_tags | Extra tags for Virtual Machine status monitor | list | `[]` | no |
| cpu\_usage\_extra\_tags | Extra tags for Virtual Machine status monitor | list(string) | `[]` | no |
| cpu\_usage\_message | Custom message for Virtual Machine CPU monitor | string | `""` | no |
| cpu\_usage\_threshold\_critical | Virtual Machine CPU usage in percent (critical threshold) | string | `"90"` | no |
| cpu\_usage\_threshold\_warning | Virtual Machine CPU usage in percent (warning threshold) | string | `"80"` | no |
@ -47,7 +46,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds before monitor new resource | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| status\_enabled | Flag to enable Virtual Machine status monitor | string | `"true"` | no |
| status\_extra\_tags | Extra tags for Virtual Machine status monitor | list | `[]` | no |
| status\_extra\_tags | Extra tags for Virtual Machine status monitor | list(string) | `[]` | no |
| status\_message | Custom message for Virtual Machine status monitor | string | `""` | no |
| status\_time\_aggregator | Monitor aggregator for Virtual Machine status [available values: min, max or avg] | string | `"max"` | no |
| status\_timeframe | Monitor timeframe for Virtual Machine status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |

View File

@ -1,8 +1,8 @@
resource "datadog_monitor" "virtualmachine_status" {
count = var.status_enabled == "true" ? 1 : 0
count = var.status_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Virtual Machine is unreachable"
message = coalesce(var.status_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.status_time_aggregator}(${var.status_timeframe}): (
@ -57,7 +57,7 @@ resource "datadog_monitor" "virtualmachine_credit_cpu_remaining_too_low" {
count = var.cpu_remaining_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Virtual Machine credit CPU {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cpu_remaining_rate_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cpu_remaining_rate_time_aggregator}(${var.cpu_remaining_rate_timeframe}):

View File

@ -18,26 +18,22 @@ Creates DataDog monitors with the following checks:
- GCP Big Query Available Slots
- GCP Big Query Concurrent Queries
- GCP Big Query Execution Time
- GCP Big Query Scanned Bytes Billed
- GCP Big Query Scanned Bytes
- GCP Big Query Stored Bytes
- GCP Big Query Table Count
- GCP Big Query Uploaded Bytes Billed
- GCP Big Query Uploaded Bytes
- GCP Big Query Execution Time
- GCP Big Query Stored Bytes
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| available\_slots\_enabled | Flag to enable GCP Big Query Available Slots monitor | string | `"true"` | no |
| available\_slots\_extra\_tags | Extra tags for GCP Big Query Available Slots monitor | list | `[]` | no |
| available\_slots\_extra\_tags | Extra tags for GCP Big Query Available Slots monitor | list(string) | `[]` | no |
| available\_slots\_message | Custom message for the Available Slots monitor | string | `""` | no |
| available\_slots\_threshold\_critical | Available Slots (critical threshold) | string | `"200"` | no |
| available\_slots\_threshold\_warning | Available Slots (warning threshold) | string | `"300"` | no |
| available\_slots\_timeframe | Timeframe for the Available Slots monitor | string | `"last_5m"` | no |
| concurrent\_queries\_enabled | Flag to enable GCP Big Query Concurrent Queries monitor | string | `"true"` | no |
| concurrent\_queries\_extra\_tags | Extra tags for GCP Big Query Concurrent Queries monitor | list | `[]` | no |
| concurrent\_queries\_extra\_tags | Extra tags for GCP Big Query Concurrent Queries monitor | list(string) | `[]` | no |
| concurrent\_queries\_message | Custom message for the Concurrent Queries monitor | string | `""` | no |
| concurrent\_queries\_threshold\_critical | Concurrent Queries (critical threshold) (hard limit 50) | string | `"45"` | no |
| concurrent\_queries\_threshold\_warning | Concurrent Queries (warning threshold) (hard limit 50) | string | `"40"` | no |
@ -45,7 +41,7 @@ Creates DataDog monitors with the following checks:
| environment | Architecture environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| execution\_time\_enabled | Flag to enable GCP Big Query Execution Time monitor | string | `"true"` | no |
| execution\_time\_extra\_tags | Extra tags for GCP Big Query Execution Time monitor | list | `[]` | no |
| execution\_time\_extra\_tags | Extra tags for GCP Big Query Execution Time monitor | list(string) | `[]` | no |
| execution\_time\_message | Custom message for the Execution Time monitor | string | `""` | no |
| execution\_time\_threshold\_critical | Average Execution Time in seconds (critical threshold) | string | `"150"` | no |
| execution\_time\_threshold\_warning | Average Execution Time in seconds (warning threshold) | string | `"100"` | no |
@ -55,37 +51,37 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds for the new host evaluation | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| scanned\_bytes\_billed\_enabled | Flag to enable GCP Big Query Scanned Bytes Billed monitor | string | `"true"` | no |
| scanned\_bytes\_billed\_extra\_tags | Extra tags for GCP Big Query Scanned Bytes Billed monitor | list | `[]` | no |
| scanned\_bytes\_billed\_extra\_tags | Extra tags for GCP Big Query Scanned Bytes Billed monitor | list(string) | `[]` | no |
| scanned\_bytes\_billed\_message | Custom message for the Scanned Bytes Billed monitor | string | `""` | no |
| scanned\_bytes\_billed\_threshold\_critical | Scanned Bytes Billed (critical threshold) | string | `"1"` | no |
| scanned\_bytes\_billed\_threshold\_warning | Scanned Bytes Billed (warning threshold) | string | `"0"` | no |
| scanned\_bytes\_billed\_timeframe | Timeframe for the Scanned Bytes Billed monitor | string | `"last_4h"` | no |
| scanned\_bytes\_enabled | Flag to enable GCP Big Query Scanned Bytes monitor | string | `"true"` | no |
| scanned\_bytes\_extra\_tags | Extra tags for GCP Big Query Scanned Bytes monitor | list | `[]` | no |
| scanned\_bytes\_extra\_tags | Extra tags for GCP Big Query Scanned Bytes monitor | list(string) | `[]` | no |
| scanned\_bytes\_message | Custom message for the Scanned Bytes monitor | string | `""` | no |
| scanned\_bytes\_threshold\_critical | Scanned Bytes (critical threshold) | string | `"1"` | no |
| scanned\_bytes\_threshold\_warning | Scanned Bytes (warning threshold) | string | `"0"` | no |
| scanned\_bytes\_timeframe | Timeframe for the Scanned Bytes monitor | string | `"last_4h"` | no |
| stored\_bytes\_enabled | Flag to enable GCP Big Query Stored Bytes monitor | string | `"true"` | no |
| stored\_bytes\_extra\_tags | Extra tags for GCP Big Query Stored Bytes monitor | list | `[]` | no |
| stored\_bytes\_extra\_tags | Extra tags for GCP Big Query Stored Bytes monitor | list(string) | `[]` | no |
| stored\_bytes\_message | Custom message for the Stored Bytes monitor | string | `""` | no |
| stored\_bytes\_threshold\_critical | Stored Bytes in fraction (critical threshold) | string | `"1"` | no |
| stored\_bytes\_threshold\_warning | Stored Bytes in fraction (warning threshold) | string | `"0"` | no |
| stored\_bytes\_timeframe | Timeframe for the Stored Bytes monitor | string | `"last_5m"` | no |
| table\_count\_enabled | Flag to enable GCP Big Query Table Count monitor | string | `"true"` | no |
| table\_count\_extra\_tags | Extra tags for GCP Big Query Table Count monitor | list | `[]` | no |
| table\_count\_extra\_tags | Extra tags for GCP Big Query Table Count monitor | list(string) | `[]` | no |
| table\_count\_message | Custom message for the Table Count monitor | string | `""` | no |
| table\_count\_threshold\_critical | Table Count (critical threshold) | string | `"1"` | no |
| table\_count\_threshold\_warning | Table Count (warning threshold) | string | `"0"` | no |
| table\_count\_timeframe | Timeframe for the Table Count monitor | string | `"last_4h"` | no |
| uploaded\_bytes\_billed\_enabled | Flag to enable GCP Big Query Uploaded Bytes Billed monitor | string | `"true"` | no |
| uploaded\_bytes\_billed\_extra\_tags | Extra tags for GCP Big Query Scanned Bytes monitor | list | `[]` | no |
| uploaded\_bytes\_billed\_extra\_tags | Extra tags for GCP Big Query Scanned Bytes monitor | list(string) | `[]` | no |
| uploaded\_bytes\_billed\_message | Custom message for the Uploaded Bytes Billed monitor | string | `""` | no |
| uploaded\_bytes\_billed\_threshold\_critical | Uploaded Bytes Billed (critical threshold) | string | `"1"` | no |
| uploaded\_bytes\_billed\_threshold\_warning | Uploaded Bytes Billed (warning threshold) | string | `"0"` | no |
| uploaded\_bytes\_billed\_timeframe | Timeframe for the Uploaded Bytes Billed monitor | string | `"last_4h"` | no |
| uploaded\_bytes\_enabled | Flag to enable GCP Big Query Uploaded Bytes monitor | string | `"true"` | no |
| uploaded\_bytes\_extra\_tags | Extra tags for GCP Big Query Uploaded Bytes monitor | list | `[]` | no |
| uploaded\_bytes\_extra\_tags | Extra tags for GCP Big Query Uploaded Bytes monitor | list(string) | `[]` | no |
| uploaded\_bytes\_message | Custom message for the Uploaded Bytes monitor | string | `""` | no |
| uploaded\_bytes\_threshold\_critical | Uploaded Bytes (critical threshold) | string | `"1"` | no |
| uploaded\_bytes\_threshold\_warning | Uploaded Bytes (warning threshold) | string | `"0"` | no |

View File

@ -5,7 +5,7 @@ resource "datadog_monitor" "concurrent_queries" {
count = var.concurrent_queries_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] GCP Big Query Concurrent Queries {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.concurrent_queries_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
avg(${var.concurrent_queries_timeframe}):
@ -51,8 +51,8 @@ warning = var.execution_time_threshold_warning
critical = var.execution_time_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
include_tags = true
notify_no_data = false
require_full_window = false
@ -71,7 +71,7 @@ resource "datadog_monitor" "scanned_bytes" {
count = var.scanned_bytes_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] GCP Big Query Scanned Bytes {{#is_alert}}{{{comparator}}} {{threshold}}B/mn ({{value}}B/mn){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}B/mn ({{value}}B/mn){{/is_warning}}"
message = coalesce(var.scanned_bytes_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
avg(${var.scanned_bytes_timeframe}):
@ -117,8 +117,8 @@ EOQ
critical = var.scanned_bytes_billed_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
include_tags = true
notify_no_data = false
require_full_window = false
@ -137,7 +137,7 @@ resource "datadog_monitor" "available_slots" {
count = var.available_slots_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] GCP Big Query Available Slots {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.available_slots_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
avg(${var.available_slots_timeframe}):
@ -183,8 +183,8 @@ warning = var.stored_bytes_threshold_warning
critical = var.stored_bytes_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
include_tags = true
notify_no_data = false
require_full_window = false
@ -203,7 +203,7 @@ resource "datadog_monitor" "table_count" {
count = var.table_count_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] GCP Big Query Table Count {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.table_count_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
avg(${var.table_count_timeframe}):
@ -249,8 +249,8 @@ EOQ
critical = var.uploaded_bytes_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
include_tags = true
notify_no_data = false
require_full_window = false
@ -269,7 +269,7 @@ resource "datadog_monitor" "uploaded_bytes_billed" {
count = var.uploaded_bytes_billed_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] GCP Big Query Uploaded Bytes Billed {{#is_alert}}{{{comparator}}} {{threshold}}B/mn ({{value}}B/mn){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}B/mn ({{value}}B/mn){{/is_warning}}"
message = coalesce(var.uploaded_bytes_billed_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
avg(${var.uploaded_bytes_billed_timeframe}):

View File

@ -17,29 +17,29 @@ module "datadog-monitors-cloud-gcp-cloud-sql-common" {
Creates DataDog monitors with the following checks:
- Cloud SQL CPU Utilization
- Cloud SQL Disk Utilization forecast
- Cloud SQL Disk Utilization
- Cloud SQL Failover Unavailable
- Cloud SQL Memory Utilization forecast
- Cloud SQL Memory Utilization
- Cloud SQL Disk Utilization
- Cloud SQL Memory Utilization forecast
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_utilization\_enabled | Flag to enable GCP Cloud SQL CPU Utilization monitor | string | `"true"` | no |
| cpu\_utilization\_extra\_tags | Extra tags for GCP Cloud SQL CPU Utilization monitor | list | `[]` | no |
| cpu\_utilization\_extra\_tags | Extra tags for GCP Cloud SQL CPU Utilization monitor | list(string) | `[]` | no |
| cpu\_utilization\_message | Custom message for the CPU Utilization monitor | string | `""` | no |
| cpu\_utilization\_threshold\_critical | CPU Utilization in percentage (critical threshold) | string | `"90"` | no |
| cpu\_utilization\_threshold\_warning | CPU Utilization in percentage (warning threshold) | string | `"80"` | no |
| cpu\_utilization\_time\_aggregator | Time aggregator for the CPU Utilization monitor | string | `"avg"` | no |
| cpu\_utilization\_timeframe | Timeframe for the CPU Utilization monitor | string | `"last_15m"` | no |
| datadog\_api\_key | | string | `"xxx"` | no |
| datadog\_app\_key | | string | `"yyy"` | no |
| disk\_utilization\_enabled | Flag to enable GCP Cloud SQL Disk Utilization monitor | string | `"true"` | no |
| disk\_utilization\_extra\_tags | Extra tags for GCP Cloud SQL CPU Utilization monitor | list | `[]` | no |
| disk\_utilization\_extra\_tags | Extra tags for GCP Cloud SQL CPU Utilization monitor | list(string) | `[]` | no |
| disk\_utilization\_forecast\_algorithm | Algorithm for the Disk Utilization Forecast monitor | string | `"linear"` | no |
| disk\_utilization\_forecast\_deviations | Deviations for the Disk Utilization Forecast monitor | string | `"1"` | no |
| disk\_utilization\_forecast\_enabled | Flag to enable GCP Cloud SQL Disk Utilization Forecast monitor | string | `"true"` | no |
| disk\_utilization\_forecast\_extra\_tags | Extra tags for GCP Cloud SQL Disk Utilization Forecast monitor | list | `[]` | no |
| disk\_utilization\_forecast\_extra\_tags | Extra tags for GCP Cloud SQL Disk Utilization Forecast monitor | list(string) | `[]` | no |
| disk\_utilization\_forecast\_interval | Interval for the Disk Utilization Forecast monitor | string | `"60m"` | no |
| disk\_utilization\_forecast\_linear\_history | History for the Disk Utilization Forecast monitor | string | `"3d"` | no |
| disk\_utilization\_forecast\_linear\_model | Model for the Disk Utilization Forecast monitor | string | `"default"` | no |
@ -57,18 +57,18 @@ Creates DataDog monitors with the following checks:
| environment | Architecture environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"900"` | no |
| failover\_unavailable\_enabled | Flag to enable GCP Cloud SQL Failover Unavailable monitor | string | `"true"` | no |
| failover\_unavailable\_extra\_tags | Extra tags for GCP Cloud SQL Failover Unavailable monitor | list | `[]` | no |
| failover\_unavailable\_extra\_tags | Extra tags for GCP Cloud SQL Failover Unavailable monitor | list(string) | `[]` | no |
| failover\_unavailable\_message | Custom message for the Failover Unavailable monitor | string | `""` | no |
| failover\_unavailable\_threshold\_critical | Failover Unavailable critical threshold | string | `"0"` | no |
| failover\_unavailable\_time\_aggregator | Time aggreggator for the Failover Unavailable monitor | string | `"max"` | no |
| failover\_unavailable\_timeframe | Timeframe for the Failover Unavailable monitor | string | `"last_10m"` | no |
| filter\_tags | Tags used for filtering | string | `"*"` | no |
| memory\_utilization\_enabled | Flag to enable GCP Cloud SQL Memory Utilization monitor | string | `"true"` | no |
| memory\_utilization\_extra\_tags | Extra tags for GCP Cloud SQL Memory Utilization monitor | list | `[]` | no |
| memory\_utilization\_extra\_tags | Extra tags for GCP Cloud SQL Memory Utilization monitor | list(string) | `[]` | no |
| memory\_utilization\_forecast\_algorithm | Algorithm for the Memory Utilization Forecast monitor | string | `"linear"` | no |
| memory\_utilization\_forecast\_deviations | Deviations for the Memory Utilization Forecast monitor | string | `"1"` | no |
| memory\_utilization\_forecast\_enabled | Flag to enable GCP Cloud SQL Memory Utilization Forecast monitor | string | `"true"` | no |
| memory\_utilization\_forecast\_extra\_tags | Extra tags for GCP Cloud SQL Memory Utilization Forecast monitor | list | `[]` | no |
| memory\_utilization\_forecast\_extra\_tags | Extra tags for GCP Cloud SQL Memory Utilization Forecast monitor | list(string) | `[]` | no |
| memory\_utilization\_forecast\_interval | Interval for the Memory Utilization Forecast monitor | string | `"30m"` | no |
| memory\_utilization\_forecast\_linear\_history | History for the Memory Utilization Forecast monitor | string | `"12h"` | no |
| memory\_utilization\_forecast\_linear\_model | Model for the Memory Utilization Forecast monitor | string | `"default"` | no |

View File

@ -5,7 +5,7 @@ resource "datadog_monitor" "cpu_utilization" {
count = var.cpu_utilization_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Cloud SQL CPU Utilization {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cpu_utilization_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cpu_utilization_time_aggregator}(${var.cpu_utilization_timeframe}):
@ -51,8 +51,8 @@ warning = var.disk_utilization_threshold_warning
critical = var.disk_utilization_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
timeout_h = 0
@ -71,7 +71,7 @@ resource "datadog_monitor" "disk_utilization_forecast" {
count = var.disk_utilization_forecast_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Cloud SQL Disk Utilization could reach {{#is_alert}}{{threshold}}%%{{/is_alert}} in a near future"
message = coalesce(var.disk_utilization_forecast_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.disk_utilization_forecast_time_aggregator}(${var.disk_utilization_forecast_timeframe}):
@ -98,8 +98,8 @@ EOQ
critical_recovery = var.disk_utilization_forecast_threshold_critical_recovery
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
timeout_h = 0
@ -118,7 +118,7 @@ resource "datadog_monitor" "memory_utilization" {
count = var.memory_utilization_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Cloud SQL Memory Utilization {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.memory_utilization_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.memory_utilization_time_aggregator}(${var.memory_utilization_timeframe}):
@ -178,8 +178,8 @@ EOQ
critical_recovery = var.memory_utilization_forecast_threshold_critical_recovery
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
timeout_h = 0
@ -189,7 +189,7 @@ EOQ
renotify_interval = 0
tags = concat(["env:${var.environment}", "type:cloud", "provider:gcp", "resource:cloud-sql", "team:claranet", "created-by:terraform"], var.memory_utilization_forecast_extra_tags)
}
}
#
# Failover Unavailable
@ -198,38 +198,29 @@ EOQ
count = var.failover_unavailable_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Cloud SQL Failover Unavailable"
message = coalesce(var.failover_unavailable_message, var.message)
>>>>>>> MON-459 use concat for extra tags
type = "metric alert"
#
# Failover Unavailable
#
resource "datadog_monitor" "failover_unavailable" {
count = var.failover_unavailable_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Cloud SQL Failover Unavailable"
message = coalesce(var.failover_unavailable_message, var.message)
type = "metric alert"
query = <<EOQ
query = <<EOQ
${var.failover_unavailable_time_aggregator}(${var.failover_unavailable_timeframe}):
avg:gcp.cloudsql.database.available_for_failover{${var.filter_tags}}
by {database_id}
<= ${var.failover_unavailable_threshold_critical}
EOQ
thresholds = {
critical = var.failover_unavailable_threshold_critical
}
thresholds = {
critical = var.failover_unavailable_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
timeout_h = 0
include_tags = true
require_full_window = false
notify_no_data = false
renotify_interval = 0
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
timeout_h = 0
include_tags = true
require_full_window = false
notify_no_data = false
renotify_interval = 0
tags = concat(["env:${var.environment}", "type:cloud", "provider:gcp", "resource:cloud-sql", "team:claranet", "created-by:terraform"], var.failover_unavailable_extra_tags)
}
tags = concat(["env:${var.environment}", "type:cloud", "provider:gcp", "resource:cloud-sql", "team:claranet", "created-by:terraform"], var.failover_unavailable_extra_tags)
}

View File

@ -29,7 +29,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds for the new host evaluation | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| replication\_lag\_enabled | Flag to enable GCP Cloud SQL Replication Lag monitor | string | `"true"` | no |
| replication\_lag\_extra\_tags | Extra tags for GCP Cloud SQL SQL Replication monitor | list | `[]` | no |
| replication\_lag\_extra\_tags | Extra tags for GCP Cloud SQL SQL Replication monitor | list(string) | `[]` | no |
| replication\_lag\_message | Custom message for the Replication Lag monitor | string | `""` | no |
| replication\_lag\_threshold\_critical | Seconds behind the master (critical threshold) | string | `"180"` | no |
| replication\_lag\_threshold\_warning | Seconds behind the master (warning threshold) | string | `"90"` | no |

View File

@ -5,7 +5,7 @@ resource "datadog_monitor" "replication_lag" {
count = var.replication_lag_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Cloud SQL MySQL Replication Lag {{#is_alert}}{{{comparator}}} {{threshold}}s ({{value}}s){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}s ({{value}}s){{/is_warning}}"
message = coalesce(var.replication_lag_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
${var.replication_lag_time_aggregator}(${var.replication_lag_timeframe}):

View File

@ -18,21 +18,20 @@ Creates DataDog monitors with the following checks:
- Compute Engine instance CPU Utilization
- Compute Engine instance Disk Throttled Bps
- Compute Engine instance Disk Throttled OPS
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cpu\_utilization\_enabled | Flag to enable CPU Utilization monitor | string | `"true"` | no |
| cpu\_utilization\_extra\_tags | Extra tags for CPU Utilization monitor | list | `[]` | no |
| cpu\_utilization\_extra\_tags | Extra tags for CPU Utilization monitor | list(string) | `[]` | no |
| cpu\_utilization\_message | Custom message for the CPU Utilization monitor | string | `""` | no |
| cpu\_utilization\_threshold\_critical | CPU Utilization in percentage (critical threshold) | string | `"90"` | no |
| cpu\_utilization\_threshold\_warning | CPU Utilization in percentage (warning threshold) | string | `"80"` | no |
| cpu\_utilization\_time\_aggregator | Time aggregator for the CPU Utilization monitor | string | `"avg"` | no |
| cpu\_utilization\_timeframe | Timeframe for the CPU Utilization monitor | string | `"last_15m"` | no |
| disk\_throttled\_bps\_enabled | Flag to enable Disk Throttled Bps monitor | string | `"true"` | no |
| disk\_throttled\_bps\_extra\_tags | Extra tags for Disk Throttled Bps monitor | list | `[]` | no |
| disk\_throttled\_bps\_extra\_tags | Extra tags for Disk Throttled Bps monitor | list(string) | `[]` | no |
| disk\_throttled\_bps\_message | Custom message for the Disk Throttled Bps monitor | string | `""` | no |
| disk\_throttled\_bps\_notify\_no\_data | Flag to enable notification for no data on Disk Throttled Bps monitor | string | `"false"` | no |
| disk\_throttled\_bps\_threshold\_critical | Disk Throttled Bps in percentage (critical threshold) | string | `"50"` | no |
@ -40,7 +39,7 @@ Creates DataDog monitors with the following checks:
| disk\_throttled\_bps\_time\_aggregator | Time aggregator for the Disk Throttled Bps monitor | string | `"min"` | no |
| disk\_throttled\_bps\_timeframe | Timeframe for the Disk Throttled Bps monitor | string | `"last_15m"` | no |
| disk\_throttled\_ops\_enabled | Flag to enable Disk Throttled OPS monitor | string | `"true"` | no |
| disk\_throttled\_ops\_extra\_tags | Extra tags for Disk Throttled OPS monitor | list | `[]` | no |
| disk\_throttled\_ops\_extra\_tags | Extra tags for Disk Throttled OPS monitor | list(string) | `[]` | no |
| disk\_throttled\_ops\_message | Custom message for the Disk Throttled OPS monitor | string | `""` | no |
| disk\_throttled\_ops\_notify\_no\_data | Flag to enable notification for no data on Disk Throttled OPS monitor | string | `"false"` | no |
| disk\_throttled\_ops\_threshold\_critical | Disk Throttled OPS in percentage (critical threshold) | string | `"50"` | no |

View File

@ -5,7 +5,7 @@ resource "datadog_monitor" "cpu_utilization" {
count = var.cpu_utilization_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Compute Engine instance CPU Utilization {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.cpu_utilization_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.cpu_utilization_time_aggregator}(${var.cpu_utilization_timeframe}):
@ -57,8 +57,8 @@ warning = var.disk_throttled_bps_threshold_warning
critical = var.disk_throttled_bps_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
timeout_h = 0
@ -77,7 +77,7 @@ resource "datadog_monitor" "disk_throttled_ops" {
count = var.disk_throttled_ops_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Compute Engine instance Disk Throttled OPS {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.disk_throttled_ops_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.disk_throttled_ops_time_aggregator}(${var.disk_throttled_ops_timeframe}):

View File

@ -17,24 +17,22 @@ module "datadog-monitors-cloud-gcp-lb" {
Creates DataDog monitors with the following checks:
- GCP LB 4xx errors
- GCP LB 5xx errors
- GCP LB bucket backend latency
- GCP LB Requests count increased abruptly
- GCP LB service backend latency
- GCP LB 5xx errors
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| backend\_latency\_bucket\_enabled | Flag to enable GCP LB Backend Latency monitor | string | `"true"` | no |
| backend\_latency\_bucket\_extra\_tags | Extra tags for GCP LB Backend Latency monitor | list | `[]` | no |
| backend\_latency\_bucket\_extra\_tags | Extra tags for GCP LB Backend Latency monitor | list(string) | `[]` | no |
| backend\_latency\_bucket\_message | Custom message for the GCP LB Backend Latency monitor | string | `""` | no |
| backend\_latency\_bucket\_threshold\_critical | Latency in milliseconds (critical threshold) | string | `"8000"` | no |
| backend\_latency\_bucket\_threshold\_warning | Latency in milliseconds (warning threshold) | string | `"4000"` | no |
| backend\_latency\_bucket\_time\_aggregator | Timeframe for the GCP LB Backend Latency monitor | string | `"min"` | no |
| backend\_latency\_bucket\_timeframe | Timeframe for the GCP LB Backend Latency monitor | string | `"last_10m"` | no |
| backend\_latency\_service\_enabled | Flag to enable GCP LB Backend Latency monitor | string | `"true"` | no |
| backend\_latency\_service\_extra\_tags | Extra tags for GCP LB Backend Latency monitor | list | `[]` | no |
| backend\_latency\_service\_extra\_tags | Extra tags for GCP LB Backend Latency monitor | list(string) | `[]` | no |
| backend\_latency\_service\_message | Custom message for the GCP LB Backend Latency monitor | string | `""` | no |
| backend\_latency\_service\_threshold\_critical | Latency in milliseconds (critical threshold) | string | `"1500"` | no |
| backend\_latency\_service\_threshold\_warning | Latency in milliseconds (warning threshold) | string | `"1000"` | no |
@ -43,7 +41,7 @@ Creates DataDog monitors with the following checks:
| environment | Architecture environment | string | n/a | yes |
| error\_rate\_4xx\_artificial\_request | Divisor Delta for the GCP LB 4XX Errors monitor | string | `"5"` | no |
| error\_rate\_4xx\_enabled | Flag to enable GCP LB 4XX Errors monitor | string | `"true"` | no |
| error\_rate\_4xx\_extra\_tags | Extra tags for GCP LB 4XX Errors monitor | list | `[]` | no |
| error\_rate\_4xx\_extra\_tags | Extra tags for GCP LB 4XX Errors monitor | list(string) | `[]` | no |
| error\_rate\_4xx\_message | Custom message for the GCP LB 4XX Errors monitor | string | `""` | no |
| error\_rate\_4xx\_threshold\_critical | Rate error in percentage (critical threshold) | string | `"60"` | no |
| error\_rate\_4xx\_threshold\_warning | Rate error in percentage (warning threshold) | string | `"50"` | no |
@ -51,7 +49,7 @@ Creates DataDog monitors with the following checks:
| error\_rate\_4xx\_timeframe | Timeframe for the GCP LB 4XX Errors monitor | string | `"last_5m"` | no |
| error\_rate\_5xx\_artificial\_request | Divisor Delta for the GCP LB 5XX Errors monitor | string | `"5"` | no |
| error\_rate\_5xx\_enabled | Flag to enable GCP LB 5XX Errors monitor | string | `"true"` | no |
| error\_rate\_5xx\_extra\_tags | Extra tags for GCP LB 5XX Errors monitor | list | `[]` | no |
| error\_rate\_5xx\_extra\_tags | Extra tags for GCP LB 5XX Errors monitor | list(string) | `[]` | no |
| error\_rate\_5xx\_message | Custom message for the GCP LB 5XX Errors monitor | string | `""` | no |
| error\_rate\_5xx\_threshold\_critical | Rate error in percentage (critical threshold) | string | `"40"` | no |
| error\_rate\_5xx\_threshold\_warning | Rate error in percentage (warning threshold) | string | `"30"` | no |
@ -63,7 +61,7 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds for the new host evaluation | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| request\_count\_enabled | Flag to enable GCP LB Request Count monitor | string | `"true"` | no |
| request\_count\_extra\_tags | Extra tags for GCP LB Request Count monitor | list | `[]` | no |
| request\_count\_extra\_tags | Extra tags for GCP LB Request Count monitor | list(string) | `[]` | no |
| request\_count\_message | Custom message for the GCP LB Request Count monitor | string | `""` | no |
| request\_count\_threshold\_critical | Desviation in percentage (critical threshold) | string | `"500"` | no |
| request\_count\_threshold\_warning | Desviation in percentage (warning threshold) | string | `"250"` | no |

View File

@ -5,7 +5,7 @@ resource "datadog_monitor" "error_rate_4xx" {
count = var.error_rate_4xx_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] GCP LB 4xx errors {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.error_rate_4xx_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.error_rate_4xx_time_aggregator}(${var.error_rate_4xx_timeframe}):
@ -53,8 +53,8 @@ warning = var.error_rate_5xx_threshold_warning
critical = var.error_rate_5xx_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
timeout_h = 0
@ -73,7 +73,7 @@ resource "datadog_monitor" "backend_latency_service" {
count = var.backend_latency_service_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] GCP LB service backend latency {{#is_alert}}{{{comparator}}} {{threshold}}ms ({{value}}ms){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}ms ({{value}}ms){{/is_warning}}"
message = coalesce(var.backend_latency_service_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.backend_latency_service_time_aggregator}(${var.backend_latency_service_timeframe}):
@ -119,8 +119,8 @@ EOQ
critical = var.backend_latency_bucket_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
timeout_h = 0
@ -139,7 +139,7 @@ resource "datadog_monitor" "request_count" {
count = var.request_count_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] GCP LB Requests count increased abruptly {{#is_alert}}{{value}}%%{{/is_alert}}{{#is_warning}}{{value}}%%{{/is_warning}}"
message = coalesce(var.request_count_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
pct_change(${var.request_count_time_aggregator}(${var.request_count_timeframe}),${var.request_count_timeshift}):

View File

@ -30,13 +30,13 @@ Creates DataDog monitors with the following checks:
| new\_host\_delay | Delay in seconds for the new host evaluation | string | `"300"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| sending\_operations\_count\_enabled | Flag to enable GCP Pub/Sub Unavailable Sending Operations Count monitor | string | `"true"` | no |
| sending\_operations\_count\_extra\_tags | Extra tags for GCP Pub/Sub Sending Operations Count monitor | list | `[]` | no |
| sending\_operations\_count\_extra\_tags | Extra tags for GCP Pub/Sub Sending Operations Count monitor | list(string) | `[]` | no |
| sending\_operations\_count\_message | Custom message for the GCP Pub/Sub Sending Operations Count monitor | string | `""` | no |
| sending\_operations\_count\_threshold\_critical | Critical threshold for the number of sending operations. | string | `"0"` | no |
| sending\_operations\_count\_time\_aggregator | Timeframe for the GCP Pub/Sub Sending Operations Count monitor | string | `"sum"` | no |
| sending\_operations\_count\_timeframe | Timeframe for the GCP Pub/Sub Sending Operations Count monitor | string | `"last_30m"` | no |
| unavailable\_sending\_operations\_count\_enabled | Flag to enable GCP Pub/Sub Unavailable Sending Operations Count monitor | string | `"true"` | no |
| unavailable\_sending\_operations\_count\_extra\_tags | Extra tags for GCP Pub/Sub Unavailable Sending Operations Count monitor | list | `[]` | no |
| unavailable\_sending\_operations\_count\_extra\_tags | Extra tags for GCP Pub/Sub Unavailable Sending Operations Count monitor | list(string) | `[]` | no |
| unavailable\_sending\_operations\_count\_message | Custom message for the GCP Pub/Sub Unavailable Sending Operations Count monitor | string | `""` | no |
| unavailable\_sending\_operations\_count\_threshold\_critical | Critical threshold for the number of unavailable sending operations | string | `"4"` | no |
| unavailable\_sending\_operations\_count\_threshold\_warning | Warning threshold for the number of unavailable sending operations | string | `"2"` | no |

View File

@ -5,7 +5,7 @@ resource "datadog_monitor" "sending_operations_count" {
count = var.sending_operations_count_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] GCP pubsub sending messages operations {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.sending_operations_count_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.sending_operations_count_time_aggregator}(${var.sending_operations_count_timeframe}):
@ -50,8 +50,8 @@ warning = var.unavailable_sending_operations_count_threshold_warning
critical = var.unavailable_sending_operations_count_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
timeout_h = 0

View File

@ -17,26 +17,16 @@ module "datadog-monitors-database-elasticsearch" {
Creates DataDog monitors with the following checks:
- Elasticsearch average index flushing to disk latency
- Elasticsearch average indexing time by document
- Elasticsearch average Old-generation garbage collections latency
- Elasticsearch average search fetch latency
- Elasticsearch average search query latency
- Elasticsearch average Young-generation garbage collections latency
- Elasticsearch change alert on the average time spent by tasks in the queue
- Elasticsearch change alert on the number of currently active queries
- Elasticsearch change alert on the number of query cache evictions
- Elasticsearch change alert on the number of request cache evictions
- Elasticsearch change alert on the number of search fetches currently running
- Elasticsearch change alert on the total number of evictions from the fielddata cache
- ElasticSearch Cluster has unassigned shards
- ElasticSearch Cluster is initializing shards
- ElasticSearch Cluster is relocating shards
- ElasticSearch Cluster status not green
- ElasticSearch does not respond
- ElasticSearch free space < 10%
- Elasticsearch JVM HEAP memory usage
- Elasticsearch JVM memory Old usage
- Elasticsearch JVM memory Young usage
- Elasticsearch average Old-generation garbage collections latency
- Elasticsearch change alert on the average time spent by tasks in the queue
- Elasticsearch change alert on the number of search fetches currently running
- ElasticSearch Cluster status not green
- ElasticSearch free space < 10%
- Elasticsearch number of current open HTTP connections anomaly detected
## Inputs
@ -44,28 +34,28 @@ Creates DataDog monitors with the following checks:
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| cluster\_initializing\_shards\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| cluster\_initializing\_shards\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| cluster\_initializing\_shards\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| cluster\_initializing\_shards\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| cluster\_initializing\_shards\_threshold\_critical | Cluster Status critical threshold | string | `"2"` | no |
| cluster\_initializing\_shards\_threshold\_warning | Cluster Status warning threshold | string | `"1"` | no |
| cluster\_initializing\_shards\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| cluster\_initializing\_shards\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_5m"` | no |
| cluster\_relocating\_shards\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| cluster\_relocating\_shards\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| cluster\_relocating\_shards\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| cluster\_relocating\_shards\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| cluster\_relocating\_shards\_threshold\_critical | Cluster Status critical threshold | string | `"2"` | no |
| cluster\_relocating\_shards\_threshold\_warning | Cluster Status warning threshold | string | `"1"` | no |
| cluster\_relocating\_shards\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| cluster\_relocating\_shards\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_5m"` | no |
| cluster\_status\_not\_green\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| cluster\_status\_not\_green\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| cluster\_status\_not\_green\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| cluster\_status\_not\_green\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| cluster\_status\_not\_green\_threshold\_critical | Cluster Status critical threshold | string | `"0"` | no |
| cluster\_status\_not\_green\_threshold\_warning | Cluster Status warning threshold | string | `"1"` | no |
| cluster\_status\_not\_green\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| cluster\_status\_not\_green\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_5m"` | no |
| cluster\_unassigned\_shards\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| cluster\_unassigned\_shards\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| cluster\_unassigned\_shards\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| cluster\_unassigned\_shards\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| cluster\_unassigned\_shards\_threshold\_critical | Cluster Status critical threshold | string | `"2"` | no |
| cluster\_unassigned\_shards\_threshold\_warning | Cluster Status warning threshold | string | `"1"` | no |
@ -74,7 +64,7 @@ Creates DataDog monitors with the following checks:
| environment | Architecture environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"15"` | no |
| fetch\_change\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| fetch\_change\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| fetch\_change\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| fetch\_change\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| fetch\_change\_threshold\_critical | Cluster Status critical threshold | string | `"100"` | no |
| fetch\_change\_threshold\_warning | Cluster Status warning threshold | string | `"75"` | no |
@ -82,14 +72,14 @@ Creates DataDog monitors with the following checks:
| fetch\_change\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_10m"` | no |
| fetch\_change\_timeshift | Timeshift for the Cluster Status monitor | string | `"last_10m"` | no |
| fetch\_latency\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| fetch\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| fetch\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| fetch\_latency\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| fetch\_latency\_threshold\_critical | Cluster Status critical threshold | string | `"4"` | no |
| fetch\_latency\_threshold\_warning | Cluster Status warning threshold | string | `"2"` | no |
| fetch\_latency\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| fetch\_latency\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_10m"` | no |
| field\_data\_evictions\_change\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| field\_data\_evictions\_change\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| field\_data\_evictions\_change\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| field\_data\_evictions\_change\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| field\_data\_evictions\_change\_threshold\_critical | Cluster Status critical threshold | string | `"120"` | no |
| field\_data\_evictions\_change\_threshold\_warning | Cluster Status warning threshold | string | `"60"` | no |
@ -100,7 +90,7 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| flush\_latency\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| flush\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| flush\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| flush\_latency\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| flush\_latency\_threshold\_critical | Cluster Status critical threshold | string | `"100"` | no |
| flush\_latency\_threshold\_warning | Cluster Status warning threshold | string | `"50"` | no |
@ -112,7 +102,7 @@ Creates DataDog monitors with the following checks:
| http\_connections\_anomaly\_deviations | Deviations to detect the anomaly | string | `"2"` | no |
| http\_connections\_anomaly\_direction | Direction of the anomaly. It can be both, below or above. | string | `"above"` | no |
| http\_connections\_anomaly\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| http\_connections\_anomaly\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| http\_connections\_anomaly\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| http\_connections\_anomaly\_interval | Interval. | string | `"60"` | no |
| http\_connections\_anomaly\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| http\_connections\_anomaly\_seasonality | Seasonality of the algorithm | string | `"hourly"` | no |
@ -121,42 +111,42 @@ Creates DataDog monitors with the following checks:
| http\_connections\_anomaly\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| http\_connections\_anomaly\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_4h"` | no |
| indexing\_latency\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| indexing\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| indexing\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| indexing\_latency\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| indexing\_latency\_threshold\_critical | Cluster Status critical threshold | string | `"15"` | no |
| indexing\_latency\_threshold\_warning | Cluster Status warning threshold | string | `"10"` | no |
| indexing\_latency\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| indexing\_latency\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_10m"` | no |
| jvm\_gc\_old\_collection\_latency\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| jvm\_gc\_old\_collection\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| jvm\_gc\_old\_collection\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| jvm\_gc\_old\_collection\_latency\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| jvm\_gc\_old\_collection\_latency\_threshold\_critical | Cluster Status critical threshold | string | `"200"` | no |
| jvm\_gc\_old\_collection\_latency\_threshold\_warning | Cluster Status warning threshold | string | `"160"` | no |
| jvm\_gc\_old\_collection\_latency\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| jvm\_gc\_old\_collection\_latency\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_10m"` | no |
| jvm\_gc\_young\_collection\_latency\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| jvm\_gc\_young\_collection\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| jvm\_gc\_young\_collection\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| jvm\_gc\_young\_collection\_latency\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| jvm\_gc\_young\_collection\_latency\_threshold\_critical | Cluster Status critical threshold | string | `"25"` | no |
| jvm\_gc\_young\_collection\_latency\_threshold\_warning | Cluster Status warning threshold | string | `"20"` | no |
| jvm\_gc\_young\_collection\_latency\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| jvm\_gc\_young\_collection\_latency\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_10m"` | no |
| jvm\_heap\_memory\_usage\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| jvm\_heap\_memory\_usage\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| jvm\_heap\_memory\_usage\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| jvm\_heap\_memory\_usage\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| jvm\_heap\_memory\_usage\_threshold\_critical | Cluster Status critical threshold | string | `"90"` | no |
| jvm\_heap\_memory\_usage\_threshold\_warning | Cluster Status warning threshold | string | `"80"` | no |
| jvm\_heap\_memory\_usage\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| jvm\_heap\_memory\_usage\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_5m"` | no |
| jvm\_memory\_old\_usage\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| jvm\_memory\_old\_usage\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| jvm\_memory\_old\_usage\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| jvm\_memory\_old\_usage\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| jvm\_memory\_old\_usage\_threshold\_critical | Cluster Status critical threshold | string | `"90"` | no |
| jvm\_memory\_old\_usage\_threshold\_warning | Cluster Status warning threshold | string | `"80"` | no |
| jvm\_memory\_old\_usage\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| jvm\_memory\_old\_usage\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_10m"` | no |
| jvm\_memory\_young\_usage\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| jvm\_memory\_young\_usage\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| jvm\_memory\_young\_usage\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| jvm\_memory\_young\_usage\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| jvm\_memory\_young\_usage\_threshold\_critical | Cluster Status critical threshold | string | `"90"` | no |
| jvm\_memory\_young\_usage\_threshold\_warning | Cluster Status warning threshold | string | `"80"` | no |
@ -165,20 +155,20 @@ Creates DataDog monitors with the following checks:
| message | Message sent when a monitor is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds before begin to monitor new host | string | `"300"` | no |
| node\_free\_space\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| node\_free\_space\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| node\_free\_space\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| node\_free\_space\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| node\_free\_space\_threshold\_critical | Cluster Status critical threshold | string | `"10"` | no |
| node\_free\_space\_threshold\_warning | Cluster Status warning threshold | string | `"20"` | no |
| node\_free\_space\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"sum"` | no |
| node\_free\_space\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_5m"` | no |
| not\_responding\_enabled | Flag to enable Elasticsearch does not respond monitor | string | `"true"` | no |
| not\_responding\_extra\_tags | Extra tags for Elasticsearch does not respond monitor | list | `[]` | no |
| not\_responding\_extra\_tags | Extra tags for Elasticsearch does not respond monitor | list(string) | `[]` | no |
| not\_responding\_message | Custom message for Elasticsearch does not respond monitor | string | `""` | no |
| not\_responding\_no\_data\_timeframe | Elasticsearch not responding monitor no data timeframe | string | `"10"` | no |
| not\_responding\_threshold\_warning | Elasticsearch not responding limit (warning threshold) | string | `"3"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| query\_cache\_evictions\_change\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| query\_cache\_evictions\_change\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| query\_cache\_evictions\_change\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| query\_cache\_evictions\_change\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| query\_cache\_evictions\_change\_threshold\_critical | Cluster Status critical threshold | string | `"120"` | no |
| query\_cache\_evictions\_change\_threshold\_warning | Cluster Status warning threshold | string | `"60"` | no |
@ -186,7 +176,7 @@ Creates DataDog monitors with the following checks:
| query\_cache\_evictions\_change\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_15m"` | no |
| query\_cache\_evictions\_change\_timeshift | Timeframe for the Cluster Status monitor | string | `"last_15m"` | no |
| request\_cache\_evictions\_change\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| request\_cache\_evictions\_change\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| request\_cache\_evictions\_change\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| request\_cache\_evictions\_change\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| request\_cache\_evictions\_change\_threshold\_critical | Cluster Status critical threshold | string | `"120"` | no |
| request\_cache\_evictions\_change\_threshold\_warning | Cluster Status warning threshold | string | `"60"` | no |
@ -194,7 +184,7 @@ Creates DataDog monitors with the following checks:
| request\_cache\_evictions\_change\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_15m"` | no |
| request\_cache\_evictions\_change\_timeshift | Timeshift for the Cluster Status monitor | string | `"last_15m"` | no |
| search\_query\_change\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| search\_query\_change\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| search\_query\_change\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| search\_query\_change\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| search\_query\_change\_threshold\_critical | Cluster Status critical threshold | string | `"100"` | no |
| search\_query\_change\_threshold\_warning | Cluster Status warning threshold | string | `"75"` | no |
@ -202,14 +192,14 @@ Creates DataDog monitors with the following checks:
| search\_query\_change\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_10m"` | no |
| search\_query\_change\_timeshift | Timeshift for the Cluster Status monitor | string | `"last_10m"` | no |
| search\_query\_latency\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| search\_query\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| search\_query\_latency\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| search\_query\_latency\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| search\_query\_latency\_threshold\_critical | Cluster Status critical threshold | string | `"1"` | no |
| search\_query\_latency\_threshold\_warning | Cluster Status warning threshold | string | `"0.5"` | no |
| search\_query\_latency\_time\_aggregator | Time aggregator for the Cluster Status monitor | string | `"avg"` | no |
| search\_query\_latency\_timeframe | Timeframe for the Cluster Status monitor | string | `"last_10m"` | no |
| task\_time\_in\_queue\_change\_enabled | Flag to enable Cluster Status monitor | string | `"true"` | no |
| task\_time\_in\_queue\_change\_extra\_tags | Extra tags for Cluster Status monitor | list | `[]` | no |
| task\_time\_in\_queue\_change\_extra\_tags | Extra tags for Cluster Status monitor | list(string) | `[]` | no |
| task\_time\_in\_queue\_change\_message | Custom message for the Cluster Status monitor | string | `""` | no |
| task\_time\_in\_queue\_change\_threshold\_critical | Cluster Status critical threshold | string | `"200"` | no |
| task\_time\_in\_queue\_change\_threshold\_warning | Cluster Status warning threshold | string | `"100"` | no |

View File

@ -5,7 +5,7 @@ resource "datadog_monitor" "not_responding" {
count = var.not_responding_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ElasticSearch does not respond"
message = coalesce(var.not_responding_message, var.message)
type = "service check"
type = "service check"
query = <<EOQ
"elasticsearch.can_connect"${module.filter-tags.service_check}.by("server","port").last(6).count_by_status()
@ -50,8 +50,8 @@ warning = var.cluster_status_not_green_threshold_warning # Yellow
critical = var.cluster_status_not_green_threshold_critical # Red
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
@ -68,7 +68,7 @@ resource "datadog_monitor" "cluster_initializing_shards" {
count = var.cluster_initializing_shards_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ElasticSearch Cluster is initializing shards"
message = coalesce(var.cluster_initializing_shards_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
${var.cluster_initializing_shards_time_aggregator}(${var.cluster_initializing_shards_timeframe}):
@ -112,8 +112,8 @@ EOQ
critical = var.cluster_relocating_shards_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
@ -130,7 +130,7 @@ resource "datadog_monitor" "cluster_unassigned_shards" {
count = var.cluster_unassigned_shards_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] ElasticSearch Cluster has unassigned shards"
message = coalesce(var.cluster_unassigned_shards_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
${var.cluster_unassigned_shards_time_aggregator}(${var.cluster_unassigned_shards_timeframe}):
@ -179,8 +179,8 @@ warning = var.node_free_space_threshold_warning
critical = var.node_free_space_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
@ -197,7 +197,7 @@ resource "datadog_monitor" "jvm_heap_memory_usage" {
count = var.jvm_heap_memory_usage_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticsearch JVM HEAP memory usage {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.jvm_heap_memory_usage_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.jvm_heap_memory_usage_time_aggregator}(${var.jvm_heap_memory_usage_timeframe}):
@ -241,8 +241,8 @@ EOQ
critical = var.jvm_memory_young_usage_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
@ -259,7 +259,7 @@ resource "datadog_monitor" "jvm_memory_old_usage" {
count = var.jvm_memory_old_usage_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticsearch JVM memory Old usage {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.jvm_memory_old_usage_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.jvm_memory_old_usage_time_aggregator}(${var.jvm_memory_old_usage_timeframe}):
@ -303,8 +303,8 @@ warning = var.jvm_gc_old_collection_latency_threshold_warning
critical = var.jvm_gc_old_collection_latency_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
@ -321,7 +321,7 @@ resource "datadog_monitor" "jvm_gc_young_collection_latency" {
count = var.jvm_gc_young_collection_latency_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticsearch average Young-generation garbage collections latency {{#is_alert}}{{{comparator}}} {{threshold}}ms ({{value}}ms){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}ms ({{value}}ms){{/is_warning}}"
message = coalesce(var.jvm_gc_young_collection_latency_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.jvm_gc_young_collection_latency_time_aggregator}(${var.jvm_gc_young_collection_latency_timeframe}):
@ -366,8 +366,8 @@ EOQ
critical = var.indexing_latency_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
@ -384,7 +384,7 @@ resource "datadog_monitor" "flush_latency" {
count = var.flush_latency_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticsearch average index flushing to disk latency {{#is_alert}}{{{comparator}}} {{threshold}}ms ({{value}}ms){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}ms ({{value}}ms){{/is_warning}}"
message = coalesce(var.flush_latency_message, var.message)
type = "query alert"
type = "query alert"
// TODO add tags to filter by node type and do not apply this monitor on non-data nodes
query = <<EOQ
@ -437,8 +437,8 @@ warning = var.http_connections_anomaly_threshold_warning
critical = var.http_connections_anomaly_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
@ -455,7 +455,7 @@ resource "datadog_monitor" "search_query_latency" {
count = var.search_query_latency_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticsearch average search query latency {{#is_alert}}{{{comparator}}} {{threshold}}ms ({{value}}ms){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}ms ({{value}}ms){{/is_warning}}"
message = coalesce(var.search_query_latency_message, var.message)
type = "query alert"
type = "query alert"
// TODO add tags to filter by node type and do not apply this monitor on non-data nodes
query = <<EOQ
@ -501,15 +501,15 @@ EOQ
critical = var.fetch_latency_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
require_full_window = true
notify_no_data = false
tags = concat(["env:${var.environment}", "type:database", "provider:elasticsearch", "resource:elasticsearch", "team:claranet", "created-by:terraform"], var.fetch_latency_extra_tags)
tags = concat(["env:${var.environment}", "type:database", "provider:elasticsearch", "resource:elasticsearch", "team:claranet", "created-by:terraform"], var.fetch_latency_extra_tags)
}
#
@ -519,7 +519,7 @@ resource "datadog_monitor" "search_query_change" {
count = var.search_query_change_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticsearch change alert on the number of currently active queries"
message = coalesce(var.search_query_change_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
pct_change(${var.search_query_change_time_aggregator}(${var.search_query_change_timeframe}),${var.search_query_change_timeshift}):
@ -540,7 +540,7 @@ EOQ
require_full_window = true
notify_no_data = false
tags = concat(["env:${var.environment}", "type:database", "provider:elasticsearch", "resource:elasticsearch", "team:claranet", "created-by:terraform"], var.search_query_change_extra_tags)
tags = concat(["env:${var.environment}", "type:database", "provider:elasticsearch", "resource:elasticsearch", "team:claranet", "created-by:terraform"], var.search_query_change_extra_tags)
}
#
@ -563,8 +563,8 @@ warning = var.fetch_change_threshold_warning
critical = var.fetch_change_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
@ -581,7 +581,7 @@ resource "datadog_monitor" "field_data_evictions_change" {
count = var.field_data_evictions_change_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticsearch change alert on the total number of evictions from the fielddata cache"
message = coalesce(var.field_data_evictions_change_message, var.message)
type = "query alert"
type = "query alert"
// TODO add tags to filter by node type and do not apply this monitor on non-data nodes
query = <<EOQ
@ -627,8 +627,8 @@ EOQ
critical = var.query_cache_evictions_change_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true
@ -645,7 +645,7 @@ resource "datadog_monitor" "request_cache_evictions_change" {
count = var.request_cache_evictions_change_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Elasticsearch change alert on the number of request cache evictions"
message = coalesce(var.request_cache_evictions_change_message, var.message)
type = "query alert"
type = "query alert"
// TODO add tags to filter by node type and do not apply this monitor on non-data nodes
query = <<EOQ
@ -690,8 +690,8 @@ warning = var.task_time_in_queue_change_threshold_warning
critical = var.task_time_in_queue_change_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_audit = false
locked = false
include_tags = true

View File

@ -17,9 +17,7 @@ module "datadog-monitors-database-mongodb" {
Creates DataDog monitors with the following checks:
- MongoDB primary state
- MongoDB replication lag
- MongoDB secondary missing
- MongoDB too much servers or wrong monitoring config
## Inputs
@ -36,17 +34,17 @@ Creates DataDog monitors with the following checks:
| mongodb\_lag\_warning | Warn replication lag in s | string | `"2"` | no |
| mongodb\_primary\_aggregator | Monitor aggregator for MongoDB primary state [available values: min, max] | string | `"max"` | no |
| mongodb\_primary\_enabled | Flag to enable MongoDB primary state monitor | string | `"true"` | no |
| mongodb\_primary\_extra\_tags | Extra tags for MongoDB primary state monitor | list | `[]` | no |
| mongodb\_primary\_extra\_tags | Extra tags for MongoDB primary state monitor | list(string) | `[]` | no |
| mongodb\_primary\_message | Custom message for MongoDB primary monitor | string | `""` | no |
| mongodb\_primary\_timeframe | Monitor timeframe for MongoDB wrong state for primary node [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_1m"` | no |
| mongodb\_replication\_aggregator | Monitor aggregator for MongoDB replication lag [available values: min, max, sum or avg] | string | `"avg"` | no |
| mongodb\_replication\_enabled | Flag to enable MongoDB replication lag monitor | string | `"true"` | no |
| mongodb\_replication\_extra\_tags | Extra tags for MongoDB replication lag monitor | list | `[]` | no |
| mongodb\_replication\_extra\_tags | Extra tags for MongoDB replication lag monitor | list(string) | `[]` | no |
| mongodb\_replication\_message | Custom message for MongoDB replication monitor | string | `""` | no |
| mongodb\_replication\_timeframe | Monitor timeframe for MongoDB replication lag [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_1m"` | no |
| mongodb\_secondary\_aggregator | Monitor aggregator for MongoDB secondary state [available values: min, max] | string | `"max"` | no |
| mongodb\_secondary\_enabled | Flag to enable MongoDB secondary state monitor | string | `"true"` | no |
| mongodb\_secondary\_extra\_tags | Extra tags for MongoDB secondary state monitor | list | `[]` | no |
| mongodb\_secondary\_extra\_tags | Extra tags for MongoDB secondary state monitor | list(string) | `[]` | no |
| mongodb\_secondary\_message | Custom message for MongoDB secondary monitor | string | `""` | no |
| mongodb\_secondary\_timeframe | Monitor timeframe for MongoDB wrong state for secondaries nodes [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| mongodb\_server\_count\_aggregator | Monitor aggregator for MongoDB server count [available values: min, max] | string | `"min"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "mongodb_primary" {
count = var.mongodb_primary_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] MongoDB primary state"
message = coalesce(var.mongodb_primary_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
${var.mongodb_primary_aggregator}(${var.mongodb_primary_timeframe}):
@ -54,7 +54,7 @@ resource "datadog_monitor" "mongodb_server_count" {
count = var.mongodb_server_count_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] MongoDB too much servers or wrong monitoring config"
message = coalesce(var.mongodb_server_count_message, var.message)
type = "metric alert"
type = "metric alert"
query = <<EOQ
${var.mongodb_server_count_aggregator}(${var.mongodb_server_count_timeframe}):

View File

@ -16,14 +16,10 @@ module "datadog-monitors-database-mysql" {
Creates DataDog monitors with the following checks:
- Mysql Aborted connects
- Mysql Connections limit
- Mysql Innodb buffer pool efficiency
- Mysql Innodb buffer pool utilization
- Mysql queries changed abnormally
- Mysql server does not respond
- Mysql Slow queries
- Mysql threads changed abnormally
- Mysql Connections limit
- Mysql Innodb buffer pool utilization
## Inputs
@ -36,33 +32,33 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| message | Message sent when an alert is triggered | string | n/a | yes |
| mysql\_aborted\_enabled | Flag to enable MySQL aborted connects monitor | string | `"true"` | no |
| mysql\_aborted\_extra\_tags | Extra tags for MySQL aborted connects monitor | list | `[]` | no |
| mysql\_aborted\_extra\_tags | Extra tags for MySQL aborted connects monitor | list(string) | `[]` | no |
| mysql\_aborted\_message | Custom message for MySQL aborted connects monitor | string | `""` | no |
| mysql\_aborted\_threshold\_critical | Maximum critical acceptable percent of aborted connects | string | `"10"` | no |
| mysql\_aborted\_threshold\_warning | Maximum warning acceptable percent of aborted connects | string | `"5"` | no |
| mysql\_aborted\_time\_aggregator | Monitor time aggregator for MySQL aborted connects monitor [available values: min, max or avg] | string | `"avg"` | no |
| mysql\_aborted\_timeframe | Monitor timeframe for MySQL aborted connects monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_10m"` | no |
| mysql\_availability\_enabled | Flag to enable Mysql availability monitor | string | `"true"` | no |
| mysql\_availability\_extra\_tags | Extra tags for Mysql availability monitor | list | `[]` | no |
| mysql\_availability\_extra\_tags | Extra tags for Mysql availability monitor | list(string) | `[]` | no |
| mysql\_availability\_message | Custom message for Mysql availability monitor | string | `""` | no |
| mysql\_availability\_no\_data\_timeframe | Mysql availability monitor no data timeframe | string | `"10"` | no |
| mysql\_availability\_threshold\_warning | Mysql availability monitor (warning threshold) | string | `"3"` | no |
| mysql\_connection\_enabled | Flag to enable MySQL connection monitor | string | `"true"` | no |
| mysql\_connection\_extra\_tags | Extra tags for MySQL connection monitor | list | `[]` | no |
| mysql\_connection\_extra\_tags | Extra tags for MySQL connection monitor | list(string) | `[]` | no |
| mysql\_connection\_message | Custom message for MySQL connection monitor | string | `""` | no |
| mysql\_connection\_threshold\_critical | Maximum critical acceptable percent of connections | string | `"80"` | no |
| mysql\_connection\_threshold\_warning | Maximum warning acceptable percent of connections | string | `"70"` | no |
| mysql\_connection\_time\_aggregator | Monitor time aggregator for MySQL connection monitor [available values: min, max or avg] | string | `"avg"` | no |
| mysql\_connection\_timeframe | Monitor timeframe for MySQL connection monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_10m"` | no |
| mysql\_pool\_efficiency\_enabled | Flag to enable MySQL innodb buffer pool efficiency monitor | string | `"true"` | no |
| mysql\_pool\_efficiency\_extra\_tags | Extra tags for MySQL innodb buffer pool efficiency monitor | list | `[]` | no |
| mysql\_pool\_efficiency\_extra\_tags | Extra tags for MySQL innodb buffer pool efficiency monitor | list(string) | `[]` | no |
| mysql\_pool\_efficiency\_message | Custom message for MySQL innodb buffer pool efficiency monitor | string | `""` | no |
| mysql\_pool\_efficiency\_threshold\_critical | Maximum critical acceptable percent of innodb buffer pool efficiency | string | `"30"` | no |
| mysql\_pool\_efficiency\_threshold\_warning | Maximum warning acceptable percent of innodb buffer pool efficiency | string | `"20"` | no |
| mysql\_pool\_efficiency\_time\_aggregator | Monitor time aggregator for MySQL innodb buffer pool efficiency monitor [available values: min, max or avg] | string | `"min"` | no |
| mysql\_pool\_efficiency\_timeframe | Monitor timeframe for MySQL innodb buffer pool efficiency monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_1h"` | no |
| mysql\_pool\_utilization\_enabled | Flag to enable MySQL innodb buffer pool utilization monitor | string | `"true"` | no |
| mysql\_pool\_utilization\_extra\_tags | Extra tags for MySQL innodb buffer pool utilization monitor | list | `[]` | no |
| mysql\_pool\_utilization\_extra\_tags | Extra tags for MySQL innodb buffer pool utilization monitor | list(string) | `[]` | no |
| mysql\_pool\_utilization\_message | Custom message for MySQL innodb buffer pool utilization monitor | string | `""` | no |
| mysql\_pool\_utilization\_threshold\_critical | Maximum critical acceptable percent of innodb buffer pool utilization | string | `"95"` | no |
| mysql\_pool\_utilization\_threshold\_warning | Maximum warning acceptable percent of innodb buffer pool utilization | string | `"80"` | no |
@ -74,7 +70,7 @@ Creates DataDog monitors with the following checks:
| mysql\_questions\_deviations | Deviations to detect the anomaly | string | `"5"` | no |
| mysql\_questions\_direction | Direction of the anomaly. It can be both, below or above. | string | `"both"` | no |
| mysql\_questions\_enabled | Flag to enable mysql queries monitor | string | `"true"` | no |
| mysql\_questions\_extra\_tags | Extra tags for MySQL queries monitor | list | `[]` | no |
| mysql\_questions\_extra\_tags | Extra tags for MySQL queries monitor | list(string) | `[]` | no |
| mysql\_questions\_interval | Interval. | string | `"60"` | no |
| mysql\_questions\_message | Custom message for MySQL queries monitor | string | `""` | no |
| mysql\_questions\_seasonality | Seasonality of the algorithm | string | `"daily"` | no |
@ -82,7 +78,7 @@ Creates DataDog monitors with the following checks:
| mysql\_questions\_time\_aggregator | Monitor time aggregator for MySQL queries monitor [available values: min, max or avg] | string | `"avg"` | no |
| mysql\_questions\_timeframe | Monitor timeframe for MySQL queries monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_4h"` | no |
| mysql\_slow\_enabled | Flag to enable MySQL slow queries monitor | string | `"true"` | no |
| mysql\_slow\_extra\_tags | Extra tags for MySQL slow queries monitor | list | `[]` | no |
| mysql\_slow\_extra\_tags | Extra tags for MySQL slow queries monitor | list(string) | `[]` | no |
| mysql\_slow\_message | Custom message for MySQL slow queries monitor | string | `""` | no |
| mysql\_slow\_threshold\_critical | Maximum critical acceptable percent of slow queries | string | `"20"` | no |
| mysql\_slow\_threshold\_warning | Maximum warning acceptable percent of slow queries | string | `"5"` | no |
@ -94,7 +90,7 @@ Creates DataDog monitors with the following checks:
| mysql\_threads\_deviations | Deviations to detect the anomaly | string | `"2"` | no |
| mysql\_threads\_direction | Direction of the anomaly. It can be both, below or above. | string | `"above"` | no |
| mysql\_threads\_enabled | Flag to enable mysql threads monitor | string | `"true"` | no |
| mysql\_threads\_extra\_tags | Extra tags for MySQL threads monitor | list | `[]` | no |
| mysql\_threads\_extra\_tags | Extra tags for MySQL threads monitor | list(string) | `[]` | no |
| mysql\_threads\_interval | Interval. | string | `"60"` | no |
| mysql\_threads\_message | Custom message for MySQL threads monitor | string | `""` | no |
| mysql\_threads\_seasonality | Seasonality of the algorithm | string | `"daily"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "mysql_availability" {
count = var.mysql_availability_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Mysql server does not respond"
message = coalesce(var.mysql_availability_message, var.message)
type = "service check"
type = "service check"
query = <<EOQ
"mysql.can_connect"${module.filter-tags.service_check}.by("port","server").last(6).count_by_status()
@ -30,7 +30,7 @@ resource "datadog_monitor" "mysql_connection" {
count = var.mysql_connection_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Mysql Connections limit {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.mysql_connection_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.mysql_connection_time_aggregator}(${var.mysql_connection_timeframe}): (
@ -44,8 +44,8 @@ warning = var.mysql_connection_threshold_warning
critical = var.mysql_connection_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = true
@ -88,7 +88,7 @@ resource "datadog_monitor" "mysql_slow" {
count = var.mysql_slow_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Mysql Slow queries {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.mysql_slow_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.mysql_slow_time_aggregator}(${var.mysql_slow_timeframe}): (
@ -102,8 +102,8 @@ EOQ
critical = var.mysql_slow_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = true
@ -146,7 +146,7 @@ resource "datadog_monitor" "mysql_pool_utilization" {
count = var.mysql_pool_utilization_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Mysql Innodb buffer pool utilization {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
message = coalesce(var.mysql_pool_utilization_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.mysql_pool_utilization_time_aggregator}(${var.mysql_pool_utilization_timeframe}):
@ -161,8 +161,8 @@ warning = var.mysql_pool_utilization_threshold_warning
critical = var.mysql_pool_utilization_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = true
@ -235,8 +235,8 @@ EOQ
critical_recovery = 0
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = true

View File

@ -16,9 +16,8 @@ module "datadog-monitors-database-postgresql" {
Creates DataDog monitors with the following checks:
- PostgreSQL Connections
- PostgreSQL server does not respond
- PostgreSQL too many locks
- PostgreSQL Connections
## Inputs
@ -32,19 +31,19 @@ Creates DataDog monitors with the following checks:
| message | Message sent when an alert is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds for the metric evaluation | string | `"300"` | no |
| postgresql\_availability\_enabled | Flag to enable PostgreSQL availability monitor | string | `"true"` | no |
| postgresql\_availability\_extra\_tags | Extra tags for PostgreSQL availability monitor | list | `[]` | no |
| postgresql\_availability\_extra\_tags | Extra tags for PostgreSQL availability monitor | list(string) | `[]` | no |
| postgresql\_availability\_message | Custom message for PostgreSQL availability monitor | string | `""` | no |
| postgresql\_availability\_no\_data\_timeframe | PostgreSQL availability monitor no data timeframe | string | `"10"` | no |
| postgresql\_availability\_threshold\_warning | PostgreSQL availability monitor (warning threshold) | string | `"3"` | no |
| postgresql\_connection\_enabled | Flag to enable PostgreSQL connection monitor | string | `"true"` | no |
| postgresql\_connection\_extra\_tags | Extra tags for PostgreSQL connection connects monitor | list | `[]` | no |
| postgresql\_connection\_extra\_tags | Extra tags for PostgreSQL connection connects monitor | list(string) | `[]` | no |
| postgresql\_connection\_message | Custom message for PostgreSQL connection monitor | string | `""` | no |
| postgresql\_connection\_threshold\_critical | Maximum critical acceptable percent of connections | string | `"80"` | no |
| postgresql\_connection\_threshold\_warning | Maximum warning acceptable percent of connections | string | `"70"` | no |
| postgresql\_connection\_time\_aggregator | Monitor time aggregator for PostgreSQL connection monitor [available values: min, max or avg] | string | `"avg"` | no |
| postgresql\_connection\_timeframe | Monitor timeframe for PostgreSQL connection monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_15m"` | no |
| postgresql\_lock\_enabled | Flag to enable PostgreSQL lock monitor | string | `"true"` | no |
| postgresql\_lock\_extra\_tags | Extra tags for PostgreSQL lock connects monitor | list | `[]` | no |
| postgresql\_lock\_extra\_tags | Extra tags for PostgreSQL lock connects monitor | list(string) | `[]` | no |
| postgresql\_lock\_message | Custom message for PostgreSQL lock monitor | string | `""` | no |
| postgresql\_lock\_threshold\_critical | Maximum critical acceptable number of locks | string | `"99"` | no |
| postgresql\_lock\_threshold\_warning | Maximum warning acceptable number of locks | string | `"70"` | no |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "postgresql_availability" {
count = var.postgresql_availability_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] PostgreSQL server does not respond"
message = coalesce(var.postgresql_availability_message, var.message)
type = "service check"
type = "service check"
query = <<EOQ
"postgres.can_connect"${module.filter-tags.service_check}.by("port","server").last(6).count_by_status()
@ -30,7 +30,7 @@ resource "datadog_monitor" "postgresql_connection_too_high" {
count = var.postgresql_connection_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] PostgreSQL Connections {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.postgresql_connection_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.postgresql_connection_time_aggregator}(${var.postgresql_connection_timeframe}):
@ -43,8 +43,8 @@ warning = var.postgresql_connection_threshold_warning
critical = var.postgresql_connection_threshold_critical
}
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
evaluation_delay = var.evaluation_delay
new_host_delay = var.new_host_delay
notify_no_data = false
renotify_interval = 0
require_full_window = true

View File

@ -16,23 +16,19 @@ module "datadog-monitors-database-redis" {
Creates DataDog monitors with the following checks:
- Redis blocked clients
- Redis does not respond
- Redis evicted keys
- Redis expired keys
- Redis hitrate
- Redis keyspace seems full (no changes since ${var.keyspace_timeframe})
- Redis latency
- Redis memory fragmented
- Redis evicted keys
- Redis hitrate
- Redis memory used
- Redis rejected connections
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| blocked\_clients\_enabled | Flag to enable Redis Blocked clients monitor | string | `"true"` | no |
| blocked\_clients\_extra\_tags | Extra tags for Redis Blocked clients monitor | list | `[]` | no |
| blocked\_clients\_extra\_tags | Extra tags for Redis Blocked clients monitor | list(string) | `[]` | no |
| blocked\_clients\_message | Custom message for Redis Blocked clients monitor | string | `""` | no |
| blocked\_clients\_threshold\_critical | Blocked clients rate (critical threshold) | string | `"30"` | no |
| blocked\_clients\_threshold\_warning | Blocked clients rate (warning threshold) | string | `"10"` | no |
@ -41,14 +37,14 @@ Creates DataDog monitors with the following checks:
| environment | Architecture environment | string | n/a | yes |
| evaluation\_delay | Delay in seconds for the metric evaluation | string | `"15"` | no |
| evictedkeys\_change\_enabled | Flag to enable Redis evicted keys monitor | string | `"true"` | no |
| evictedkeys\_change\_extra\_tags | Extra tags for Redis evicted keys monitor | list | `[]` | no |
| evictedkeys\_change\_extra\_tags | Extra tags for Redis evicted keys monitor | list(string) | `[]` | no |
| evictedkeys\_change\_message | Custom message for Redis evicted keys monitor | string | `""` | no |
| evictedkeys\_change\_threshold\_critical | Evicted keys change (critical threshold) | string | `"100"` | no |
| evictedkeys\_change\_threshold\_warning | Evicted keys change (warning threshold) | string | `"20"` | no |
| evictedkeys\_change\_time\_aggregator | Monitor aggregator for Redis evicted keys [available values: min, max or avg] | string | `"avg"` | no |
| evictedkeys\_change\_timeframe | Monitor timeframe for Redis evicted keys [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| expirations\_rate\_enabled | Flag to enable Redis keys expirations monitor | string | `"true"` | no |
| expirations\_rate\_extra\_tags | Extra tags for Redis keys expirations monitor | list | `[]` | no |
| expirations\_rate\_extra\_tags | Extra tags for Redis keys expirations monitor | list(string) | `[]` | no |
| expirations\_rate\_message | Custom message for Redis keys expirations monitor | string | `""` | no |
| expirations\_rate\_threshold\_critical | Expirations percent (critical threshold) | string | `"80"` | no |
| expirations\_rate\_threshold\_warning | Expirations percent (warning threshold) | string | `"60"` | no |
@ -58,35 +54,35 @@ Creates DataDog monitors with the following checks:
| filter\_tags\_custom\_excluded | Tags excluded for custom filtering when filter_tags_use_defaults is false | string | `""` | no |
| filter\_tags\_use\_defaults | Use default filter tags convention | string | `"true"` | no |
| hitrate\_enabled | Flag to enable Redis hitrate monitor | string | `"true"` | no |
| hitrate\_extra\_tags | Extra tags for Redis hitrate monitor | list | `[]` | no |
| hitrate\_extra\_tags | Extra tags for Redis hitrate monitor | list(string) | `[]` | no |
| hitrate\_message | Custom message for Redis hitrate monitor | string | `""` | no |
| hitrate\_threshold\_critical | hitrate limit (critical threshold) | string | `"10"` | no |
| hitrate\_threshold\_warning | hitrate limit (warning threshold) | string | `"30"` | no |
| hitrate\_time\_aggregator | Monitor aggregator for Redis hitrate [available values: min, max or avg] | string | `"max"` | no |
| hitrate\_timeframe | Monitor timeframe for Redis hitrate [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| keyspace\_enabled | Flag to enable Redis keyspace monitor | string | `"true"` | no |
| keyspace\_extra\_tags | Extra tags for Redis keyspace monitor | list | `[]` | no |
| keyspace\_extra\_tags | Extra tags for Redis keyspace monitor | list(string) | `[]` | no |
| keyspace\_message | Custom message for Redis keyspace monitor | string | `""` | no |
| keyspace\_threshold\_critical | Keyspace no changement (critical threshold) | string | `"0"` | no |
| keyspace\_threshold\_warning | Keyspace no changement (warning threshold) | string | `"1"` | no |
| keyspace\_time\_aggregator | Monitor aggregator for Redis keyspace [available values: min, max or avg] | string | `"min"` | no |
| keyspace\_timeframe | Monitor timeframe for Redis keyspace [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| latency\_enabled | Flag to enable Redis latency monitor | string | `"true"` | no |
| latency\_extra\_tags | Extra tags for Redis latency monitor | list | `[]` | no |
| latency\_extra\_tags | Extra tags for Redis latency monitor | list(string) | `[]` | no |
| latency\_message | Custom message for Redis latency monitor | string | `""` | no |
| latency\_threshold\_critical | latency limit (critical threshold) | string | `"100"` | no |
| latency\_threshold\_warning | latency limit (warning threshold) | string | `"50"` | no |
| latency\_time\_aggregator | Monitor aggregator for Redis latency [available values: min, max or avg] | string | `"min"` | no |
| latency\_timeframe | Monitor timeframe for Redis latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| mem\_frag\_enabled | Flag to enable Redis memory RAM fragmentation monitor | string | `"true"` | no |
| mem\_frag\_extra\_tags | Extra tags for Redis memory RAM fragmentation monitor | list | `[]` | no |
| mem\_frag\_extra\_tags | Extra tags for Redis memory RAM fragmentation monitor | list(string) | `[]` | no |
| mem\_frag\_message | Custom message for Redis memory RAM fragmentation monitor | string | `""` | no |
| mem\_frag\_threshold\_critical | memory RAM fragmentation limit (critical threshold) | string | `"150"` | no |
| mem\_frag\_threshold\_warning | memory RAM fragmentation limit (warning threshold) | string | `"130"` | no |
| mem\_frag\_time\_aggregator | Monitor aggregator for Redis memory RAM fragmentation [available values: min, max or avg] | string | `"min"` | no |
| mem\_frag\_timeframe | Monitor timeframe for Redis memory RAM fragmentation [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `"last_5m"` | no |
| mem\_used\_enabled | Flag to enable Redis RAM memory used monitor | string | `"true"` | no |
| mem\_used\_extra\_tags | Extra tags for Redis RAM memory used monitor | list | `[]` | no |
| mem\_used\_extra\_tags | Extra tags for Redis RAM memory used monitor | list(string) | `[]` | no |
| mem\_used\_message | Custom message for Redis RAM memory used monitor | string | `""` | no |
| mem\_used\_threshold\_critical | RAM memory used limit (critical threshold) | string | `"95"` | no |
| mem\_used\_threshold\_warning | RAM memory used limit (warning threshold) | string | `"85"` | no |
@ -95,13 +91,13 @@ Creates DataDog monitors with the following checks:
| message | Message sent when a Redis monitor is triggered | string | n/a | yes |
| new\_host\_delay | Delay in seconds for the metric evaluation | string | `"300"` | no |
| not\_responding\_enabled | Flag to enable Redis does not respond monitor | string | `"true"` | no |
| not\_responding\_extra\_tags | Extra tags for Redis does not respond monitor | list | `[]` | no |
| not\_responding\_extra\_tags | Extra tags for Redis does not respond monitor | list(string) | `[]` | no |
| not\_responding\_message | Custom message for Redis does not respond monitor | string | `""` | no |
| not\_responding\_no\_data\_timeframe | Redis does not respond monitor no data timeframe | string | `"10"` | no |
| not\_responding\_threshold\_warning | Redis does not respond monitor (warning threshold) | string | `"3"` | no |
| prefix\_slug | Prefix string to prepend between brackets on every monitors names | string | `""` | no |
| rejected\_con\_enabled | Flag to enable Redis rejected connections errors monitor | string | `"true"` | no |
| rejected\_con\_extra\_tags | Extra tags for Redis rejected connections errors monitor | list | `[]` | no |
| rejected\_con\_extra\_tags | Extra tags for Redis rejected connections errors monitor | list(string) | `[]` | no |
| rejected\_con\_message | Custom message for Redis rejected connections errors monitor | string | `""` | no |
| rejected\_con\_threshold\_critical | rejected connections errors limit (critical threshold) | string | `"50"` | no |
| rejected\_con\_threshold\_warning | rejected connections errors limit (warning threshold) | string | `"10"` | no |

View File

@ -5,7 +5,7 @@ resource "datadog_monitor" "not_responding" {
count = var.not_responding_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Redis does not respond"
message = coalesce(var.not_responding_message, var.message)
type = "service check"
type = "service check"
query = <<EOQ
"redis.can_connect"${module.filter-tags.service_check}.by("redis_host","redis_port").last(6).count_by_status()
@ -63,7 +63,7 @@ resource "datadog_monitor" "expirations" {
count = var.expirations_rate_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Redis expired keys {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.expirations_rate_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.expirations_rate_time_aggregator}(${var.expirations_rate_timeframe}): (
@ -124,7 +124,7 @@ resource "datadog_monitor" "keyspace_full" {
count = var.keyspace_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Redis keyspace seems full (no changes since ${var.keyspace_timeframe})"
message = coalesce(var.keyspace_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.keyspace_time_aggregator}(${var.keyspace_timeframe}): (
@ -185,7 +185,7 @@ resource "datadog_monitor" "memory_frag" {
count = var.mem_frag_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Redis memory fragmented {{#is_alert}}{{{comparator}}} {{threshold}}% ({{value}}%){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}% ({{value}}%){{/is_warning}}"
message = coalesce(var.mem_frag_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
${var.mem_frag_time_aggregator}(${var.mem_frag_timeframe}):
@ -245,7 +245,7 @@ resource "datadog_monitor" "latency" {
count = var.latency_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Redis latency {{#is_alert}}{{{comparator}}} {{threshold}}ms ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}}ms ({{value}}){{/is_warning}}"
message = coalesce(var.latency_message, var.message)
type = "query alert"
type = "query alert"
query = <<EOQ
change(${var.latency_time_aggregator}(${var.latency_timeframe}),${var.latency_timeframe}): (

View File

@ -23,7 +23,7 @@ Creates DataDog monitors with the following checks:
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| apache\_connect\_enabled | Flag to enable Apache status monitor | string | `"true"` | no |
| apache\_connect\_extra\_tags | Extra tags for Apache process monitor | list | `[]` | no |
| apache\_connect\_extra\_tags | Extra tags for Apache process monitor | list(string) | `[]` | no |
| apache\_connect\_message | Custom message for Apache status monitor | string | `""` | no |
| apache\_connect\_threshold\_warning | Apache status monitor (warning threshold) | string | `"3"` | no |
| environment | Architecture Environment | string | n/a | yes |

View File

@ -2,7 +2,7 @@ resource "datadog_monitor" "datadog_apache_process" {
count = var.apache_connect_enabled == "true" ? 1 : 0
name = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] Apache vhost status does not respond"
message = coalesce(var.apache_connect_message, var.message)
type = "service check"
type = "service check"
query = <<EOQ
"apache.can_connect"${module.filter-tags.service_check}.by("port","server").last(6).count_by_status()

Some files were not shown because too many files have changed in this diff Show More