Merged in MON-160-generalize-the-timeframe-best-pr (pull request #86)
MON-160 generalize the timeframe best practice Approved-by: Alexandre Gaillet <alexandre.gaillet@fr.clara.net> Approved-by: Quentin Manfroi <quentin.manfroi@yahoo.fr> Approved-by: Laurent Piroelle <laurent.piroelle@fr.clara.net> Approved-by: Adrien Broyere <adrien.broyere@fr.clara.net> Approved-by: Patrick Decat <patrick.decat@fr.clara.net>
This commit is contained in:
commit
fee7deb0d3
@ -32,6 +32,7 @@ Inputs
|
|||||||
|------|-------------|:----:|:-----:|:-----:|
|
|------|-------------|:----:|:-----:|:-----:|
|
||||||
| alb_no_healthy_instances_message | Custom message for ALB no healthy instances monitor | string | `` | no |
|
| alb_no_healthy_instances_message | Custom message for ALB no healthy instances monitor | string | `` | no |
|
||||||
| alb_no_healthy_instances_silenced | Groups to mute for ALB no healthy instances monitor | map | `<map>` | no |
|
| alb_no_healthy_instances_silenced | Groups to mute for ALB no healthy instances monitor | map | `<map>` | no |
|
||||||
|
| alb_no_healthy_instances_timeframe | Monitor timeframe for ALB no healthy instances [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_1m` | no |
|
||||||
| artificial_requests_count | Number of false requests used to mitigate false positive in case of low trafic | string | `5` | no |
|
| artificial_requests_count | Number of false requests used to mitigate false positive in case of low trafic | string | `5` | no |
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
||||||
| environment | Architecture environment | string | - | yes |
|
| environment | Architecture environment | string | - | yes |
|
||||||
@ -41,22 +42,27 @@ Inputs
|
|||||||
| httpcode_elb_4xx_silenced | Groups to mute for ALB httpcode 4xx monitor | map | `<map>` | no |
|
| httpcode_elb_4xx_silenced | Groups to mute for ALB httpcode 4xx monitor | map | `<map>` | no |
|
||||||
| httpcode_elb_4xx_threshold_critical | loadbalancer 4xx critical threshold in percentage | string | `80` | no |
|
| httpcode_elb_4xx_threshold_critical | loadbalancer 4xx critical threshold in percentage | string | `80` | no |
|
||||||
| httpcode_elb_4xx_threshold_warning | loadbalancer 4xx warning threshold in percentage | string | `60` | no |
|
| httpcode_elb_4xx_threshold_warning | loadbalancer 4xx warning threshold in percentage | string | `60` | no |
|
||||||
|
| httpcode_elb_4xx_timeframe | Monitor timeframe for ALB httpcode 4xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| httpcode_elb_5xx_message | Custom message for ALB httpcode 5xx monitor | string | `` | no |
|
| httpcode_elb_5xx_message | Custom message for ALB httpcode 5xx monitor | string | `` | no |
|
||||||
| httpcode_elb_5xx_silenced | Groups to mute for ALB httpcode 5xx monitor | map | `<map>` | no |
|
| httpcode_elb_5xx_silenced | Groups to mute for ALB httpcode 5xx monitor | map | `<map>` | no |
|
||||||
| httpcode_elb_5xx_threshold_critical | loadbalancer 5xxcritical threshold in percentage | string | `80` | no |
|
| httpcode_elb_5xx_threshold_critical | loadbalancer 5xxcritical threshold in percentage | string | `80` | no |
|
||||||
| httpcode_elb_5xx_threshold_warning | loadbalancer 5xx warning threshold in percentage | string | `60` | no |
|
| httpcode_elb_5xx_threshold_warning | loadbalancer 5xx warning threshold in percentage | string | `60` | no |
|
||||||
|
| httpcode_elb_5xx_timeframe | Monitor timeframe for ALB httpcode 5xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| httpcode_target_4xx_message | Custom message for ALB target httpcode 4xx monitor | string | `` | no |
|
| httpcode_target_4xx_message | Custom message for ALB target httpcode 4xx monitor | string | `` | no |
|
||||||
| httpcode_target_4xx_silenced | Groups to mute for ALB target httpcode 4xx monitor | map | `<map>` | no |
|
| httpcode_target_4xx_silenced | Groups to mute for ALB target httpcode 4xx monitor | map | `<map>` | no |
|
||||||
| httpcode_target_4xx_threshold_critical | target 4xx critical threshold in percentage | string | `80` | no |
|
| httpcode_target_4xx_threshold_critical | target 4xx critical threshold in percentage | string | `80` | no |
|
||||||
| httpcode_target_4xx_threshold_warning | target 4xx warning threshold in percentage | string | `60` | no |
|
| httpcode_target_4xx_threshold_warning | target 4xx warning threshold in percentage | string | `60` | no |
|
||||||
|
| httpcode_target_4xx_timeframe | Monitor timeframe for ALB target httpcode 4xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| httpcode_target_5xx_message | Custom message for ALB target httpcode 5xx monitor | string | `` | no |
|
| httpcode_target_5xx_message | Custom message for ALB target httpcode 5xx monitor | string | `` | no |
|
||||||
| httpcode_target_5xx_silenced | Groups to mute for ALB target httpcode 5xx monitor | map | `<map>` | no |
|
| httpcode_target_5xx_silenced | Groups to mute for ALB target httpcode 5xx monitor | map | `<map>` | no |
|
||||||
| httpcode_target_5xx_threshold_critical | target 5xx critical threshold in percentage | string | `80` | no |
|
| httpcode_target_5xx_threshold_critical | target 5xx critical threshold in percentage | string | `80` | no |
|
||||||
| httpcode_target_5xx_threshold_warning | target 5xx warning threshold in percentage | string | `60` | no |
|
| httpcode_target_5xx_threshold_warning | target 5xx warning threshold in percentage | string | `60` | no |
|
||||||
|
| httpcode_target_5xx_timeframe | Monitor timeframe for ALB target httpcode 5xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| latency_message | Custom message for ALB latency monitor | string | `` | no |
|
| latency_message | Custom message for ALB latency monitor | string | `` | no |
|
||||||
| latency_silenced | Groups to mute for ALB latency monitor | map | `<map>` | no |
|
| latency_silenced | Groups to mute for ALB latency monitor | map | `<map>` | no |
|
||||||
| latency_threshold_critical | latency critical threshold in milliseconds | string | `1000` | no |
|
| latency_threshold_critical | latency critical threshold in milliseconds | string | `1000` | no |
|
||||||
| latency_threshold_warning | latency warning threshold in milliseconds | string | `500` | no |
|
| latency_threshold_warning | latency warning threshold in milliseconds | string | `500` | no |
|
||||||
|
| latency_timeframe | Monitor timeframe for ALB latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| message | Message sent when a monitor is triggered | string | - | yes |
|
| message | Message sent when a monitor is triggered | string | - | yes |
|
||||||
|
|
||||||
Related documentation
|
Related documentation
|
||||||
|
|||||||
@ -38,6 +38,12 @@ variable "alb_no_healthy_instances_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "alb_no_healthy_instances_timeframe" {
|
||||||
|
description = "Monitor timeframe for ALB no healthy instances [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_1m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "latency_silenced" {
|
variable "latency_silenced" {
|
||||||
description = "Groups to mute for ALB latency monitor"
|
description = "Groups to mute for ALB latency monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -50,6 +56,12 @@ variable "latency_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "latency_timeframe" {
|
||||||
|
description = "Monitor timeframe for ALB latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "latency_threshold_critical" {
|
variable "latency_threshold_critical" {
|
||||||
default = 1000
|
default = 1000
|
||||||
description = "latency critical threshold in milliseconds"
|
description = "latency critical threshold in milliseconds"
|
||||||
@ -72,6 +84,12 @@ variable "httpcode_elb_4xx_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "httpcode_elb_4xx_timeframe" {
|
||||||
|
description = "Monitor timeframe for ALB httpcode 4xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "httpcode_elb_4xx_threshold_critical" {
|
variable "httpcode_elb_4xx_threshold_critical" {
|
||||||
default = 80
|
default = 80
|
||||||
description = "loadbalancer 4xx critical threshold in percentage"
|
description = "loadbalancer 4xx critical threshold in percentage"
|
||||||
@ -94,6 +112,12 @@ variable "httpcode_target_4xx_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "httpcode_target_4xx_timeframe" {
|
||||||
|
description = "Monitor timeframe for ALB target httpcode 4xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "httpcode_target_4xx_threshold_critical" {
|
variable "httpcode_target_4xx_threshold_critical" {
|
||||||
default = 80
|
default = 80
|
||||||
description = "target 4xx critical threshold in percentage"
|
description = "target 4xx critical threshold in percentage"
|
||||||
@ -116,6 +140,12 @@ variable "httpcode_elb_5xx_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "httpcode_elb_5xx_timeframe" {
|
||||||
|
description = "Monitor timeframe for ALB httpcode 5xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "httpcode_elb_5xx_threshold_critical" {
|
variable "httpcode_elb_5xx_threshold_critical" {
|
||||||
default = 80
|
default = 80
|
||||||
description = "loadbalancer 5xxcritical threshold in percentage"
|
description = "loadbalancer 5xxcritical threshold in percentage"
|
||||||
@ -138,6 +168,12 @@ variable "httpcode_target_5xx_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "httpcode_target_5xx_timeframe" {
|
||||||
|
description = "Monitor timeframe for ALB target httpcode 5xx [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "httpcode_target_5xx_threshold_critical" {
|
variable "httpcode_target_5xx_threshold_critical" {
|
||||||
default = 80
|
default = 80
|
||||||
description = "target 5xx critical threshold in percentage"
|
description = "target 5xx critical threshold in percentage"
|
||||||
|
|||||||
@ -14,7 +14,7 @@ resource "datadog_monitor" "ALB_no_healthy_instances" {
|
|||||||
message = "${coalesce(var.alb_no_healthy_instances_message, var.message)}"
|
message = "${coalesce(var.alb_no_healthy_instances_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_1m): (
|
min(${var.alb_no_healthy_instances_timeframe}): (
|
||||||
min:aws.applicationelb.healthy_host_count{${data.template_file.filter.rendered}} by {region,loadbalancer}
|
min:aws.applicationelb.healthy_host_count{${data.template_file.filter.rendered}} by {region,loadbalancer}
|
||||||
) <= 0
|
) <= 0
|
||||||
EOF
|
EOF
|
||||||
@ -43,7 +43,7 @@ resource "datadog_monitor" "ALB_latency" {
|
|||||||
message = "${coalesce(var.latency_message, var.message)}"
|
message = "${coalesce(var.latency_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.latency_timeframe}): (
|
||||||
min:aws.applicationelb.target_response_time.average{${data.template_file.filter.rendered}} by {region,loadbalancer}
|
min:aws.applicationelb.target_response_time.average{${data.template_file.filter.rendered}} by {region,loadbalancer}
|
||||||
) > ${var.latency_threshold_critical}
|
) > ${var.latency_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -73,7 +73,7 @@ resource "datadog_monitor" "ALB_httpcode_elb_5xx" {
|
|||||||
message = "${coalesce(var.httpcode_elb_5xx_message, var.message)}"
|
message = "${coalesce(var.httpcode_elb_5xx_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.httpcode_elb_5xx_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.applicationelb.httpcode_elb_5xx{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() /
|
min:aws.applicationelb.httpcode_elb_5xx{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() /
|
||||||
(min:aws.applicationelb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() + ${var.artificial_requests_count}),
|
(min:aws.applicationelb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() + ${var.artificial_requests_count}),
|
||||||
@ -106,7 +106,7 @@ resource "datadog_monitor" "ALB_httpcode_elb_4xx" {
|
|||||||
message = "${coalesce(var.httpcode_elb_4xx_message, var.message)}"
|
message = "${coalesce(var.httpcode_elb_4xx_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.httpcode_elb_4xx_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.applicationelb.httpcode_elb_4xx{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() /
|
min:aws.applicationelb.httpcode_elb_4xx{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() /
|
||||||
(min:aws.applicationelb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() + ${var.artificial_requests_count}),
|
(min:aws.applicationelb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() + ${var.artificial_requests_count}),
|
||||||
@ -139,7 +139,7 @@ resource "datadog_monitor" "ALB_httpcode_target_5xx" {
|
|||||||
message = "${coalesce(var.httpcode_target_5xx_message, var.message)}"
|
message = "${coalesce(var.httpcode_target_5xx_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.httpcode_target_5xx_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.applicationelb.httpcode_target_5xx{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() /
|
min:aws.applicationelb.httpcode_target_5xx{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() /
|
||||||
(min:aws.applicationelb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() + ${var.artificial_requests_count}),
|
(min:aws.applicationelb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() + ${var.artificial_requests_count}),
|
||||||
@ -172,7 +172,7 @@ resource "datadog_monitor" "ALB_httpcode_target_4xx" {
|
|||||||
message = "${coalesce(var.httpcode_target_4xx_message, var.message)}"
|
message = "${coalesce(var.httpcode_target_4xx_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.httpcode_target_4xx_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.applicationelb.httpcode_target_4xx{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() /
|
min:aws.applicationelb.httpcode_target_4xx{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() /
|
||||||
(min:aws.applicationelb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() + ${var.artificial_requests_count}),
|
(min:aws.applicationelb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancer}.as_count() + ${var.artificial_requests_count}),
|
||||||
|
|||||||
@ -35,14 +35,17 @@ Inputs
|
|||||||
| http_4xx_requests_silenced | Groups to mute for API Gateway HTTP 4xx requests monitor | map | `<map>` | no |
|
| http_4xx_requests_silenced | Groups to mute for API Gateway HTTP 4xx requests monitor | map | `<map>` | no |
|
||||||
| http_4xx_requests_threshold_critical | Maximum critical acceptable percent of 4xx errors | string | `30` | no |
|
| http_4xx_requests_threshold_critical | Maximum critical acceptable percent of 4xx errors | string | `30` | no |
|
||||||
| http_4xx_requests_threshold_warning | Maximum warning acceptable percent of 4xx errors | string | `15` | no |
|
| http_4xx_requests_threshold_warning | Maximum warning acceptable percent of 4xx errors | string | `15` | no |
|
||||||
|
| http_4xx_requests_timeframe | Monitor timeframe for API HTTP 4xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| http_5xx_requests_message | Custom message for API Gateway HTTP 5xx requests monitor | string | `` | no |
|
| http_5xx_requests_message | Custom message for API Gateway HTTP 5xx requests monitor | string | `` | no |
|
||||||
| http_5xx_requests_silenced | Groups to mute for API Gateway HTTP 5xx requests monitor | map | `<map>` | no |
|
| http_5xx_requests_silenced | Groups to mute for API Gateway HTTP 5xx requests monitor | map | `<map>` | no |
|
||||||
| http_5xx_requests_threshold_critical | Maximum critical acceptable percent of 5xx errors | string | `20` | no |
|
| http_5xx_requests_threshold_critical | Maximum critical acceptable percent of 5xx errors | string | `20` | no |
|
||||||
| http_5xx_requests_threshold_warning | Maximum warning acceptable percent of 5xx errors | string | `10` | no |
|
| http_5xx_requests_threshold_warning | Maximum warning acceptable percent of 5xx errors | string | `10` | no |
|
||||||
|
| http_5xx_requests_timeframe | Monitor timeframe for API HTTP 5xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| latency_message | Custom message for API Gateway latency monitor | string | `` | no |
|
| latency_message | Custom message for API Gateway latency monitor | string | `` | no |
|
||||||
| latency_silenced | Groups to mute for API Gateway latency monitor | map | `<map>` | no |
|
| latency_silenced | Groups to mute for API Gateway latency monitor | map | `<map>` | no |
|
||||||
| latency_threshold_critical | Alerting threshold in milliseconds | string | `800` | no |
|
| latency_threshold_critical | Alerting threshold in milliseconds | string | `800` | no |
|
||||||
| latency_threshold_warning | Warning threshold in milliseconds | string | `400` | no |
|
| latency_threshold_warning | Warning threshold in milliseconds | string | `400` | no |
|
||||||
|
| latency_timeframe | Monitor timeframe for API latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| message | Message sent when a monitor is triggered | string | - | yes |
|
| message | Message sent when a monitor is triggered | string | - | yes |
|
||||||
|
|
||||||
Related documentation
|
Related documentation
|
||||||
|
|||||||
@ -33,6 +33,12 @@ variable "latency_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "latency_timeframe" {
|
||||||
|
description = "Monitor timeframe for API latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "latency_threshold_critical" {
|
variable "latency_threshold_critical" {
|
||||||
default = 800
|
default = 800
|
||||||
description = "Alerting threshold in milliseconds"
|
description = "Alerting threshold in milliseconds"
|
||||||
@ -59,6 +65,12 @@ variable "http_5xx_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "http_5xx_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for API HTTP 5xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "http_5xx_requests_threshold_critical" {
|
variable "http_5xx_requests_threshold_critical" {
|
||||||
default = 20
|
default = 20
|
||||||
description = "Maximum critical acceptable percent of 5xx errors"
|
description = "Maximum critical acceptable percent of 5xx errors"
|
||||||
@ -85,6 +97,12 @@ variable "http_4xx_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "http_4xx_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for API HTTP 4xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "http_4xx_requests_threshold_critical" {
|
variable "http_4xx_requests_threshold_critical" {
|
||||||
default = 30
|
default = 30
|
||||||
description = "Maximum critical acceptable percent of 4xx errors"
|
description = "Maximum critical acceptable percent of 4xx errors"
|
||||||
|
|||||||
@ -5,7 +5,7 @@ resource "datadog_monitor" "API_Gateway_latency" {
|
|||||||
message = "${coalesce(var.latency_message, var.message)}"
|
message = "${coalesce(var.latency_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.latency_timeframe}): (
|
||||||
min:aws.apigateway.latency{${var.filter_tags}} by {region,apiname}
|
min:aws.apigateway.latency{${var.filter_tags}} by {region,apiname}
|
||||||
) > ${var.latency_threshold_critical}
|
) > ${var.latency_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -36,7 +36,7 @@ resource "datadog_monitor" "API_http_5xx_errors_count" {
|
|||||||
message = "${coalesce(var.http_5xx_requests_message, var.message)}"
|
message = "${coalesce(var.http_5xx_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.http_5xx_requests_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.apigateway.5xxerror{${var.filter_tags}} by {region,apiname}.as_count() /
|
min:aws.apigateway.5xxerror{${var.filter_tags}} by {region,apiname}.as_count() /
|
||||||
(min:aws.apigateway.count{${var.filter_tags}} by {region,apiname}.as_count() + ${var.artificial_requests_count}),
|
(min:aws.apigateway.count{${var.filter_tags}} by {region,apiname}.as_count() + ${var.artificial_requests_count}),
|
||||||
@ -70,7 +70,7 @@ resource "datadog_monitor" "API_http_4xx_errors_count" {
|
|||||||
message = "${coalesce(var.http_4xx_requests_message, var.message)}"
|
message = "${coalesce(var.http_4xx_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.http_4xx_requests_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.apigateway.4xxerror{${var.filter_tags}} by {region,apiname}.as_count() /
|
min:aws.apigateway.4xxerror{${var.filter_tags}} by {region,apiname}.as_count() /
|
||||||
(min:aws.apigateway.count{${var.filter_tags}} by {region,apiname}.as_count() + ${var.artificial_requests_count}),
|
(min:aws.apigateway.count{${var.filter_tags}} by {region,apiname}.as_count() + ${var.artificial_requests_count}),
|
||||||
|
|||||||
@ -33,14 +33,17 @@ Inputs
|
|||||||
| cpu_silenced | Groups to mute for ES cluster cpu monitor | map | `<map>` | no |
|
| cpu_silenced | Groups to mute for ES cluster cpu monitor | map | `<map>` | no |
|
||||||
| cpu_threshold_critical | CPU usage in percent (critical threshold) | string | `90` | no |
|
| cpu_threshold_critical | CPU usage in percent (critical threshold) | string | `90` | no |
|
||||||
| cpu_threshold_warning | CPU usage in percent (warning threshold) | string | `80` | no |
|
| cpu_threshold_warning | CPU usage in percent (warning threshold) | string | `80` | no |
|
||||||
|
| cpu_timeframe | Monitor timeframe for ES cluster cpu [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_15m` | no |
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
||||||
| diskspace_message | Custom message for ES cluster diskspace monitor | string | `` | no |
|
| diskspace_message | Custom message for ES cluster diskspace monitor | string | `` | no |
|
||||||
| diskspace_silenced | Groups to mute for ES cluster diskspace monitor | map | `<map>` | no |
|
| diskspace_silenced | Groups to mute for ES cluster diskspace monitor | map | `<map>` | no |
|
||||||
| diskspace_threshold_critical | Disk free space in percent (critical threshold) | string | `10` | no |
|
| diskspace_threshold_critical | Disk free space in percent (critical threshold) | string | `10` | no |
|
||||||
| diskspace_threshold_warning | Disk free space in percent (warning threshold) | string | `20` | no |
|
| diskspace_threshold_warning | Disk free space in percent (warning threshold) | string | `20` | no |
|
||||||
|
| diskspace_timeframe | Monitor timeframe for ES cluster diskspace [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_15m` | no |
|
||||||
| environment | Architecture Environment | string | - | yes |
|
| environment | Architecture Environment | string | - | yes |
|
||||||
| es_cluster_status_message | Custom message for ES cluster status monitor | string | `` | no |
|
| es_cluster_status_message | Custom message for ES cluster status monitor | string | `` | no |
|
||||||
| es_cluster_status_silenced | Groups to mute for ES cluster status monitor | map | `<map>` | no |
|
| es_cluster_status_silenced | Groups to mute for ES cluster status monitor | map | `<map>` | no |
|
||||||
|
| es_cluster_status_timeframe | Monitor timeframe for ES cluster status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_30m` | no |
|
||||||
| es_cluster_volume_size | ElasticSearch Domain volume size (in GB) | string | - | yes |
|
| es_cluster_volume_size | ElasticSearch Domain volume size (in GB) | string | - | yes |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
|
|||||||
@ -38,6 +38,12 @@ variable "es_cluster_status_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "es_cluster_status_timeframe" {
|
||||||
|
description = "Monitor timeframe for ES cluster status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_30m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "es_cluster_volume_size" {
|
variable "es_cluster_volume_size" {
|
||||||
description = "ElasticSearch Domain volume size (in GB)"
|
description = "ElasticSearch Domain volume size (in GB)"
|
||||||
}
|
}
|
||||||
@ -54,6 +60,12 @@ variable "diskspace_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "diskspace_timeframe" {
|
||||||
|
description = "Monitor timeframe for ES cluster diskspace [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_15m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "diskspace_threshold_warning" {
|
variable "diskspace_threshold_warning" {
|
||||||
description = "Disk free space in percent (warning threshold)"
|
description = "Disk free space in percent (warning threshold)"
|
||||||
default = "20"
|
default = "20"
|
||||||
@ -76,6 +88,12 @@ variable "cpu_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "cpu_timeframe" {
|
||||||
|
description = "Monitor timeframe for ES cluster cpu [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_15m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "cpu_threshold_warning" {
|
variable "cpu_threshold_warning" {
|
||||||
description = "CPU usage in percent (warning threshold)"
|
description = "CPU usage in percent (warning threshold)"
|
||||||
default = "80"
|
default = "80"
|
||||||
|
|||||||
@ -18,7 +18,7 @@ resource "datadog_monitor" "es_cluster_status" {
|
|||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
max(last_30m): (
|
max(${var.es_cluster_status_timeframe}): (
|
||||||
avg:aws.es.cluster_statusred{${data.template_file.filter.rendered}} by {region,name} * 2 +
|
avg:aws.es.cluster_statusred{${data.template_file.filter.rendered}} by {region,name} * 2 +
|
||||||
(avg:aws.es.cluster_statusyellow{${data.template_file.filter.rendered}} by {region,name} + 0.1)
|
(avg:aws.es.cluster_statusyellow{${data.template_file.filter.rendered}} by {region,name} + 0.1)
|
||||||
) >= 2
|
) >= 2
|
||||||
@ -52,7 +52,7 @@ resource "datadog_monitor" "es_free_space_low" {
|
|||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_15m): (
|
avg(${var.diskspace_timeframe}): (
|
||||||
avg:aws.es.free_storage_space{${data.template_file.filter.rendered}} by {region,name} /
|
avg:aws.es.free_storage_space{${data.template_file.filter.rendered}} by {region,name} /
|
||||||
(${var.es_cluster_volume_size}*1000) * 100
|
(${var.es_cluster_volume_size}*1000) * 100
|
||||||
) < ${var.diskspace_threshold_critical}
|
) < ${var.diskspace_threshold_critical}
|
||||||
@ -86,7 +86,7 @@ resource "datadog_monitor" "es_cpu_90_15min" {
|
|||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_15m): (
|
avg(${var.cpu_timeframe}): (
|
||||||
avg:aws.es.cpuutilization{${data.template_file.filter.rendered}} by {region,name}
|
avg:aws.es.cpuutilization{${data.template_file.filter.rendered}} by {region,name}
|
||||||
) > ${var.cpu_threshold_critical}
|
) > ${var.cpu_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
|
|||||||
@ -30,32 +30,37 @@ Inputs
|
|||||||
|
|
||||||
| Name | Description | Type | Default | Required |
|
| Name | Description | Type | Default | Required |
|
||||||
|------|-------------|:----:|:-----:|:-----:|
|
|------|-------------|:----:|:-----:|:-----:|
|
||||||
| dd_aws_elb | # ELB | string | `disable` | no |
|
|
||||||
| artificial_requests_count | Number of false requests used to mitigate false positive in case of low trafic | string | `5` | no |
|
| artificial_requests_count | Number of false requests used to mitigate false positive in case of low trafic | string | `5` | no |
|
||||||
|
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
||||||
| elb_4xx_message | Custom message for ELB 4xx errors monitor | string | `` | no |
|
| elb_4xx_message | Custom message for ELB 4xx errors monitor | string | `` | no |
|
||||||
| elb_4xx_silenced | Groups to mute for ELB 4xx errors monitor | map | `<map>` | no |
|
| elb_4xx_silenced | Groups to mute for ELB 4xx errors monitor | map | `<map>` | no |
|
||||||
| elb_4xx_threshold_critical | loadbalancer 4xx critical threshold in percentage | string | `10` | no |
|
| elb_4xx_threshold_critical | loadbalancer 4xx critical threshold in percentage | string | `10` | no |
|
||||||
| elb_4xx_threshold_warning | loadbalancer 4xx warning threshold in percentage | string | `5` | no |
|
| elb_4xx_threshold_warning | loadbalancer 4xx warning threshold in percentage | string | `5` | no |
|
||||||
|
| elb_4xx_timeframe | Monitor timeframe for ELB 4xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| elb_5xx_message | Custom message for ELB 5xx errors monitor | string | `` | no |
|
| elb_5xx_message | Custom message for ELB 5xx errors monitor | string | `` | no |
|
||||||
| elb_5xx_silenced | Groups to mute for ELB 5xx errors monitor | map | `<map>` | no |
|
| elb_5xx_silenced | Groups to mute for ELB 5xx errors monitor | map | `<map>` | no |
|
||||||
| elb_5xx_threshold_critical | loadbalancer 5xx critical threshold in percentage | string | `10` | no |
|
| elb_5xx_threshold_critical | loadbalancer 5xx critical threshold in percentage | string | `10` | no |
|
||||||
| elb_5xx_threshold_warning | loadbalancer 5xx warning threshold in percentage | string | `5` | no |
|
| elb_5xx_threshold_warning | loadbalancer 5xx warning threshold in percentage | string | `5` | no |
|
||||||
|
| elb_5xx_timeframe | Monitor timeframe for ELB 5xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| elb_backend_4xx_message | Custom message for ELB backend 4xx errors monitor | string | `` | no |
|
| elb_backend_4xx_message | Custom message for ELB backend 4xx errors monitor | string | `` | no |
|
||||||
| elb_backend_4xx_silenced | Groups to mute for ELB backend 4xx errors monitor | map | `<map>` | no |
|
| elb_backend_4xx_silenced | Groups to mute for ELB backend 4xx errors monitor | map | `<map>` | no |
|
||||||
| elb_backend_4xx_threshold_critical | loadbalancer backend 4xx critical threshold in percentage | string | `10` | no |
|
| elb_backend_4xx_threshold_critical | loadbalancer backend 4xx critical threshold in percentage | string | `10` | no |
|
||||||
| elb_backend_4xx_threshold_warning | loadbalancer backend 4xx warning threshold in percentage | string | `5` | no |
|
| elb_backend_4xx_threshold_warning | loadbalancer backend 4xx warning threshold in percentage | string | `5` | no |
|
||||||
|
| elb_backend_4xx_timeframe | Monitor timeframe for ELB backend 4xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| elb_backend_5xx_message | Custom message for ELB backend 5xx errors monitor | string | `` | no |
|
| elb_backend_5xx_message | Custom message for ELB backend 5xx errors monitor | string | `` | no |
|
||||||
| elb_backend_5xx_silenced | Groups to mute for ELB backend 5xx errors monitor | map | `<map>` | no |
|
| elb_backend_5xx_silenced | Groups to mute for ELB backend 5xx errors monitor | map | `<map>` | no |
|
||||||
| elb_backend_5xx_threshold_critical | loadbalancer backend 5xx critical threshold in percentage | string | `10` | no |
|
| elb_backend_5xx_threshold_critical | loadbalancer backend 5xx critical threshold in percentage | string | `10` | no |
|
||||||
| elb_backend_5xx_threshold_warning | loadbalancer backend 5xx warning threshold in percentage | string | `5` | no |
|
| elb_backend_5xx_threshold_warning | loadbalancer backend 5xx warning threshold in percentage | string | `5` | no |
|
||||||
|
| elb_backend_5xx_timeframe | Monitor timeframe for ELB backend 5xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| elb_backend_latency_critical | latency critical threshold in seconds | string | `5` | no |
|
| elb_backend_latency_critical | latency critical threshold in seconds | string | `5` | no |
|
||||||
| elb_backend_latency_message | Custom message for ELB backend latency monitor | string | `` | no |
|
| elb_backend_latency_message | Custom message for ELB backend latency monitor | string | `` | no |
|
||||||
| elb_backend_latency_silenced | Groups to mute for ELB backend latency monitor | map | `<map>` | no |
|
| elb_backend_latency_silenced | Groups to mute for ELB backend latency monitor | map | `<map>` | no |
|
||||||
|
| elb_backend_latency_timeframe | Monitor timeframe for ELB backend latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| elb_backend_latency_warning | latency warning threshold in seconds | string | `1` | no |
|
| elb_backend_latency_warning | latency warning threshold in seconds | string | `1` | no |
|
||||||
| elb_no_healthy_instance_message | Custom message for ELB no healty instance monitor | string | `` | no |
|
| elb_no_healthy_instance_message | Custom message for ELB no healty instance monitor | string | `` | no |
|
||||||
| elb_no_healthy_instance_silenced | Groups to mute for ELB no healty instance monitor | map | `<map>` | no |
|
| elb_no_healthy_instance_silenced | Groups to mute for ELB no healty instance monitor | map | `<map>` | no |
|
||||||
|
| elb_no_healthy_instance_timeframe | Monitor timeframe for ELB no healty instance [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| environment | Architecture Environment | string | - | yes |
|
| environment | Architecture Environment | string | - | yes |
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| message | Message sent when an alert is triggered | string | - | yes |
|
| message | Message sent when an alert is triggered | string | - | yes |
|
||||||
|
|||||||
@ -25,10 +25,6 @@ variable "filter_tags_custom" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
## ELB
|
## ELB
|
||||||
variable "dd_aws_elb" {
|
|
||||||
default = "disable"
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "elb_no_healthy_instance_silenced" {
|
variable "elb_no_healthy_instance_silenced" {
|
||||||
description = "Groups to mute for ELB no healty instance monitor"
|
description = "Groups to mute for ELB no healty instance monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -41,6 +37,12 @@ variable "elb_no_healthy_instance_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "elb_no_healthy_instance_timeframe" {
|
||||||
|
description = "Monitor timeframe for ELB no healty instance [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "elb_4xx_silenced" {
|
variable "elb_4xx_silenced" {
|
||||||
description = "Groups to mute for ELB 4xx errors monitor"
|
description = "Groups to mute for ELB 4xx errors monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -53,6 +55,12 @@ variable "elb_4xx_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "elb_4xx_timeframe" {
|
||||||
|
description = "Monitor timeframe for ELB 4xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "elb_4xx_threshold_warning" {
|
variable "elb_4xx_threshold_warning" {
|
||||||
description = "loadbalancer 4xx warning threshold in percentage"
|
description = "loadbalancer 4xx warning threshold in percentage"
|
||||||
default = 5
|
default = 5
|
||||||
@ -75,6 +83,12 @@ variable "elb_5xx_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "elb_5xx_timeframe" {
|
||||||
|
description = "Monitor timeframe for ELB 5xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "elb_5xx_threshold_warning" {
|
variable "elb_5xx_threshold_warning" {
|
||||||
description = "loadbalancer 5xx warning threshold in percentage"
|
description = "loadbalancer 5xx warning threshold in percentage"
|
||||||
default = 5
|
default = 5
|
||||||
@ -97,6 +111,12 @@ variable "elb_backend_4xx_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "elb_backend_4xx_timeframe" {
|
||||||
|
description = "Monitor timeframe for ELB backend 4xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "elb_backend_4xx_threshold_warning" {
|
variable "elb_backend_4xx_threshold_warning" {
|
||||||
description = "loadbalancer backend 4xx warning threshold in percentage"
|
description = "loadbalancer backend 4xx warning threshold in percentage"
|
||||||
default = 5
|
default = 5
|
||||||
@ -119,6 +139,12 @@ variable "elb_backend_5xx_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "elb_backend_5xx_timeframe" {
|
||||||
|
description = "Monitor timeframe for ELB backend 5xx errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "elb_backend_5xx_threshold_warning" {
|
variable "elb_backend_5xx_threshold_warning" {
|
||||||
description = "loadbalancer backend 5xx warning threshold in percentage"
|
description = "loadbalancer backend 5xx warning threshold in percentage"
|
||||||
default = 5
|
default = 5
|
||||||
@ -141,6 +167,12 @@ variable "elb_backend_latency_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "elb_backend_latency_timeframe" {
|
||||||
|
description = "Monitor timeframe for ELB backend latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "elb_backend_latency_warning" {
|
variable "elb_backend_latency_warning" {
|
||||||
description = "latency warning threshold in seconds"
|
description = "latency warning threshold in seconds"
|
||||||
default = 1
|
default = 1
|
||||||
|
|||||||
@ -11,7 +11,7 @@ resource "datadog_monitor" "ELB_no_healthy_instances" {
|
|||||||
message = "${coalesce(var.elb_no_healthy_instance_message, var.message)}"
|
message = "${coalesce(var.elb_no_healthy_instance_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.elb_no_healthy_instance_timeframe}): (
|
||||||
min:aws.elb.healthy_host_count{${data.template_file.filter.rendered}} by {region,loadbalancername}
|
min:aws.elb.healthy_host_count{${data.template_file.filter.rendered}} by {region,loadbalancername}
|
||||||
) < 1
|
) < 1
|
||||||
EOF
|
EOF
|
||||||
@ -38,7 +38,7 @@ resource "datadog_monitor" "ELB_too_much_4xx" {
|
|||||||
message = "${coalesce(var.elb_4xx_message, var.message)}"
|
message = "${coalesce(var.elb_4xx_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.elb_4xx_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.elb.httpcode_elb_4xx{${data.template_file.filter.rendered}} by {region,loadbalancername}.as_count() /
|
min:aws.elb.httpcode_elb_4xx{${data.template_file.filter.rendered}} by {region,loadbalancername}.as_count() /
|
||||||
(min:aws.elb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancername}.as_count() + ${var.artificial_requests_count}),
|
(min:aws.elb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancername}.as_count() + ${var.artificial_requests_count}),
|
||||||
@ -73,7 +73,7 @@ resource "datadog_monitor" "ELB_too_much_5xx" {
|
|||||||
message = "${coalesce(var.elb_5xx_message, var.message)}"
|
message = "${coalesce(var.elb_5xx_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.elb_5xx_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.elb.httpcode_elb_5xx{${data.template_file.filter.rendered}} by {region,loadbalancername} /
|
min:aws.elb.httpcode_elb_5xx{${data.template_file.filter.rendered}} by {region,loadbalancername} /
|
||||||
(min:aws.elb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancername} + ${var.artificial_requests_count}),
|
(min:aws.elb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancername} + ${var.artificial_requests_count}),
|
||||||
@ -108,7 +108,7 @@ resource "datadog_monitor" "ELB_too_much_4xx_backend" {
|
|||||||
message = "${coalesce(var.elb_backend_4xx_message, var.message)}"
|
message = "${coalesce(var.elb_backend_4xx_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.elb_backend_4xx_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.elb.httpcode_backend_4xx{${data.template_file.filter.rendered}} by {region,loadbalancername} /
|
min:aws.elb.httpcode_backend_4xx{${data.template_file.filter.rendered}} by {region,loadbalancername} /
|
||||||
(min:aws.elb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancername} + ${var.artificial_requests_count}),
|
(min:aws.elb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancername} + ${var.artificial_requests_count}),
|
||||||
@ -143,7 +143,7 @@ resource "datadog_monitor" "ELB_too_much_5xx_backend" {
|
|||||||
message = "${coalesce(var.elb_backend_5xx_message, var.message)}"
|
message = "${coalesce(var.elb_backend_5xx_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.elb_backend_5xx_timeframe}): (
|
||||||
default(
|
default(
|
||||||
min:aws.elb.httpcode_backend_5xx{${data.template_file.filter.rendered}} by {region,loadbalancername} /
|
min:aws.elb.httpcode_backend_5xx{${data.template_file.filter.rendered}} by {region,loadbalancername} /
|
||||||
(min:aws.elb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancername} + ${var.artificial_requests_count}),
|
(min:aws.elb.request_count{${data.template_file.filter.rendered}} by {region,loadbalancername} + ${var.artificial_requests_count}),
|
||||||
@ -178,7 +178,7 @@ resource "datadog_monitor" "ELB_backend_latency" {
|
|||||||
message = "${coalesce(var.elb_backend_latency_message, var.message)}"
|
message = "${coalesce(var.elb_backend_latency_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.elb_backend_latency_warning}): (
|
||||||
min:aws.elb.latency{${data.template_file.filter.rendered}} by {region,loadbalancername}
|
min:aws.elb.latency{${data.template_file.filter.rendered}} by {region,loadbalancername}
|
||||||
) > ${var.elb_backend_latency_critical}
|
) > ${var.elb_backend_latency_critical}
|
||||||
EOF
|
EOF
|
||||||
|
|||||||
@ -29,12 +29,14 @@ Inputs
|
|||||||
| cpu_silenced | Groups to mute for RDS CPU usage monitor | map | `<map>` | no |
|
| cpu_silenced | Groups to mute for RDS CPU usage monitor | map | `<map>` | no |
|
||||||
| cpu_threshold_critical | CPU usage in percent (critical threshold) | string | `90` | no |
|
| cpu_threshold_critical | CPU usage in percent (critical threshold) | string | `90` | no |
|
||||||
| cpu_threshold_warning | CPU usage in percent (warning threshold) | string | `80` | no |
|
| cpu_threshold_warning | CPU usage in percent (warning threshold) | string | `80` | no |
|
||||||
|
| cpu_timeframe | Monitor timeframe for RDS CPU usage [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_15m` | no |
|
||||||
|
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
||||||
| diskspace_message | Custom message for RDS free diskspace monitor | string | `` | no |
|
| diskspace_message | Custom message for RDS free diskspace monitor | string | `` | no |
|
||||||
| diskspace_silenced | Groups to mute for RDS free diskspace monitor | map | `<map>` | no |
|
| diskspace_silenced | Groups to mute for RDS free diskspace monitor | map | `<map>` | no |
|
||||||
| diskspace_threshold_critical | Disk free space in percent (critical threshold) | string | `10` | no |
|
| diskspace_threshold_critical | Disk free space in percent (critical threshold) | string | `10` | no |
|
||||||
| diskspace_threshold_warning | Disk free space in percent (warning threshold) | string | `20` | no |
|
| diskspace_threshold_warning | Disk free space in percent (warning threshold) | string | `20` | no |
|
||||||
|
| diskspace_timeframe | Monitor timeframe for RDS free diskspace [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_15m` | no |
|
||||||
| environment | Architecture Environment | string | - | yes |
|
| environment | Architecture Environment | string | - | yes |
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| message | Message sent when an alert is triggered | string | - | yes |
|
| message | Message sent when an alert is triggered | string | - | yes |
|
||||||
|
|||||||
@ -38,6 +38,12 @@ variable "cpu_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "cpu_timeframe" {
|
||||||
|
description = "Monitor timeframe for RDS CPU usage [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_15m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "cpu_threshold_warning" {
|
variable "cpu_threshold_warning" {
|
||||||
description = "CPU usage in percent (warning threshold)"
|
description = "CPU usage in percent (warning threshold)"
|
||||||
default = "80"
|
default = "80"
|
||||||
@ -60,6 +66,12 @@ variable "diskspace_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "diskspace_timeframe" {
|
||||||
|
description = "Monitor timeframe for RDS free diskspace [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_15m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "diskspace_threshold_warning" {
|
variable "diskspace_threshold_warning" {
|
||||||
description = "Disk free space in percent (warning threshold)"
|
description = "Disk free space in percent (warning threshold)"
|
||||||
default = "20"
|
default = "20"
|
||||||
|
|||||||
@ -14,7 +14,7 @@ resource "datadog_monitor" "rds_cpu_90_15min" {
|
|||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_15m): (
|
avg(${var.cpu_timeframe}): (
|
||||||
avg:aws.rds.cpuutilization{${data.template_file.filter.rendered}} by {region,name}
|
avg:aws.rds.cpuutilization{${data.template_file.filter.rendered}} by {region,name}
|
||||||
) > ${var.cpu_threshold_critical}
|
) > ${var.cpu_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -46,7 +46,7 @@ resource "datadog_monitor" "rds_free_space_low" {
|
|||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_15m): (
|
avg(${var.diskspace_timeframe}): (
|
||||||
avg:aws.rds.free_storage_space{${data.template_file.filter.rendered}} by {region,name} /
|
avg:aws.rds.free_storage_space{${data.template_file.filter.rendered}} by {region,name} /
|
||||||
avg:aws.rds.total_storage_space{${data.template_file.filter.rendered}} by {region,name} * 100
|
avg:aws.rds.total_storage_space{${data.template_file.filter.rendered}} by {region,name} * 100
|
||||||
) < ${var.diskspace_threshold_critical}
|
) < ${var.diskspace_threshold_critical}
|
||||||
|
|||||||
@ -24,10 +24,11 @@ Inputs
|
|||||||
|
|
||||||
| Name | Description | Type | Default | Required |
|
| Name | Description | Type | Default | Required |
|
||||||
|------|-------------|:----:|:-----:|:-----:|
|
|------|-------------|:----:|:-----:|:-----:|
|
||||||
| environment | Architecture Environment | string | - | yes |
|
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
||||||
|
| environment | Architecture Environment | string | - | yes |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| message | Message sent when an alert is triggered | string | - | yes |
|
| message | Message sent when an alert is triggered | string | - | yes |
|
||||||
| vpn_status_message | Custom message for VPN status monitor | string | `` | no |
|
| vpn_status_message | Custom message for VPN status monitor | string | `` | no |
|
||||||
| vpn_status_silenced | Groups to mute for VPN status monitor | map | `<map>` | no |
|
| vpn_status_silenced | Groups to mute for VPN status monitor | map | `<map>` | no |
|
||||||
|
| vpn_status_timeframe | Monitor timeframe for VPN status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
|
|||||||
@ -35,3 +35,9 @@ variable "vpn_status_message" {
|
|||||||
type = "string"
|
type = "string"
|
||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "vpn_status_timeframe" {
|
||||||
|
description = "Monitor timeframe for VPN status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|||||||
@ -11,7 +11,7 @@ resource "datadog_monitor" "VPN_status" {
|
|||||||
message = "${coalesce(var.vpn_status_message, var.message)}"
|
message = "${coalesce(var.vpn_status_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (
|
avg(${var.vpn_status_timeframe}): (
|
||||||
avg:aws.vpn.tunnel_state{${data.template_file.filter.rendered}} by {region,name}
|
avg:aws.vpn.tunnel_state{${data.template_file.filter.rendered}} by {region,name}
|
||||||
) < 1
|
) < 1
|
||||||
EOF
|
EOF
|
||||||
|
|||||||
@ -33,6 +33,7 @@ Inputs
|
|||||||
| failed_requests_silenced | Groups to mute for API Management failed requests monitor | map | `<map>` | no |
|
| failed_requests_silenced | Groups to mute for API Management failed requests monitor | map | `<map>` | no |
|
||||||
| failed_requests_threshold_critical | Maximum acceptable percent of failed requests | string | `90` | no |
|
| failed_requests_threshold_critical | Maximum acceptable percent of failed requests | string | `90` | no |
|
||||||
| failed_requests_threshold_warning | Warning regarding acceptable percent of failed requests | string | `50` | no |
|
| failed_requests_threshold_warning | Warning regarding acceptable percent of failed requests | string | `50` | no |
|
||||||
|
| failed_requests_timeframe | Monitor timeframe for API Management failed requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| message | Message sent when a Redis monitor is triggered | string | - | yes |
|
| message | Message sent when a Redis monitor is triggered | string | - | yes |
|
||||||
@ -40,16 +41,20 @@ Inputs
|
|||||||
| other_requests_silenced | Groups to mute for API Management other requests monitor | map | `<map>` | no |
|
| other_requests_silenced | Groups to mute for API Management other requests monitor | map | `<map>` | no |
|
||||||
| other_requests_threshold_critical | Maximum acceptable percent of other requests | string | `90` | no |
|
| other_requests_threshold_critical | Maximum acceptable percent of other requests | string | `90` | no |
|
||||||
| other_requests_threshold_warning | Warning regarding acceptable percent of other requests | string | `50` | no |
|
| other_requests_threshold_warning | Warning regarding acceptable percent of other requests | string | `50` | no |
|
||||||
|
| other_requests_timeframe | Monitor timeframe for API Management other requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| status_message | Custom message for API Management status monitor | string | `` | no |
|
| status_message | Custom message for API Management status monitor | string | `` | no |
|
||||||
| status_silenced | Groups to mute for API Management status monitor | map | `<map>` | no |
|
| status_silenced | Groups to mute for API Management status monitor | map | `<map>` | no |
|
||||||
|
| status_timeframe | Monitor timeframe for API Management status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| successful_requests_message | Custom message for API Management successful requests monitor | string | `` | no |
|
| successful_requests_message | Custom message for API Management successful requests monitor | string | `` | no |
|
||||||
| successful_requests_silenced | Groups to mute for API Management successful requests monitor | map | `<map>` | no |
|
| successful_requests_silenced | Groups to mute for API Management successful requests monitor | map | `<map>` | no |
|
||||||
| successful_requests_threshold_critical | Minimum acceptable percent of successful requests | string | `10` | no |
|
| successful_requests_threshold_critical | Minimum acceptable percent of successful requests | string | `10` | no |
|
||||||
| successful_requests_threshold_warning | Warning regarding acceptable percent of successful requests | string | `30` | no |
|
| successful_requests_threshold_warning | Warning regarding acceptable percent of successful requests | string | `30` | no |
|
||||||
|
| successful_requests_timeframe | Monitor timeframe for API Management successful requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| unauthorized_requests_message | Custom message for API Management unauthorized requests monitor | string | `` | no |
|
| unauthorized_requests_message | Custom message for API Management unauthorized requests monitor | string | `` | no |
|
||||||
| unauthorized_requests_silenced | Groups to mute for API Management unauthorized requests monitor | map | `<map>` | no |
|
| unauthorized_requests_silenced | Groups to mute for API Management unauthorized requests monitor | map | `<map>` | no |
|
||||||
| unauthorized_requests_threshold_critical | Maximum acceptable percent of unauthorized requests | string | `90` | no |
|
| unauthorized_requests_threshold_critical | Maximum acceptable percent of unauthorized requests | string | `90` | no |
|
||||||
| unauthorized_requests_threshold_warning | Warning regarding acceptable percent of unauthorized requests | string | `50` | no |
|
| unauthorized_requests_threshold_warning | Warning regarding acceptable percent of unauthorized requests | string | `50` | no |
|
||||||
|
| unauthorized_requests_timeframe | Monitor timeframe for API Management unauthorized requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
|
|
||||||
Related documentation
|
Related documentation
|
||||||
---------------------
|
---------------------
|
||||||
|
|||||||
@ -37,6 +37,12 @@ variable "status_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "status_timeframe" {
|
||||||
|
description = "Monitor timeframe for API Management status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_requests_silenced" {
|
variable "failed_requests_silenced" {
|
||||||
description = "Groups to mute for API Management failed requests monitor"
|
description = "Groups to mute for API Management failed requests monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -49,6 +55,12 @@ variable "failed_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for API Management failed requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_requests_threshold_critical" {
|
variable "failed_requests_threshold_critical" {
|
||||||
description = "Maximum acceptable percent of failed requests"
|
description = "Maximum acceptable percent of failed requests"
|
||||||
default = 90
|
default = 90
|
||||||
@ -71,6 +83,12 @@ variable "other_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "other_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for API Management other requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "other_requests_threshold_critical" {
|
variable "other_requests_threshold_critical" {
|
||||||
description = "Maximum acceptable percent of other requests"
|
description = "Maximum acceptable percent of other requests"
|
||||||
default = 90
|
default = 90
|
||||||
@ -93,6 +111,12 @@ variable "unauthorized_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "unauthorized_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for API Management unauthorized requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "unauthorized_requests_threshold_critical" {
|
variable "unauthorized_requests_threshold_critical" {
|
||||||
description = "Maximum acceptable percent of unauthorized requests"
|
description = "Maximum acceptable percent of unauthorized requests"
|
||||||
default = 90
|
default = 90
|
||||||
@ -115,6 +139,12 @@ variable "successful_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "successful_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for API Management successful requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "successful_requests_threshold_critical" {
|
variable "successful_requests_threshold_critical" {
|
||||||
description = "Minimum acceptable percent of successful requests"
|
description = "Minimum acceptable percent of successful requests"
|
||||||
default = 10
|
default = 10
|
||||||
|
|||||||
@ -13,7 +13,7 @@ resource "datadog_monitor" "apimgt_status" {
|
|||||||
message = "${coalesce(var.status_message, var.message)}"
|
message = "${coalesce(var.status_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m):avg:azure.apimanagement_service.status{${data.template_file.filter.rendered}} by {resource_group,region,name} < 1
|
avg(${var.status_timeframe}):avg:azure.apimanagement_service.status{${data.template_file.filter.rendered}} by {resource_group,region,name} < 1
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
@ -42,7 +42,7 @@ resource "datadog_monitor" "apimgt_failed_requests" {
|
|||||||
message = "${coalesce(var.failed_requests_message, var.message)}"
|
message = "${coalesce(var.failed_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.failed_requests_timeframe}): (
|
||||||
avg:azure.apimanagement_service.failed_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
avg:azure.apimanagement_service.failed_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
||||||
avg:azure.apimanagement_service.total_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() * 100
|
avg:azure.apimanagement_service.total_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() * 100
|
||||||
) > ${var.failed_requests_threshold_critical}
|
) > ${var.failed_requests_threshold_critical}
|
||||||
@ -74,7 +74,7 @@ resource "datadog_monitor" "apimgt_other_requests" {
|
|||||||
message = "${coalesce(var.other_requests_message, var.message)}"
|
message = "${coalesce(var.other_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.other_requests_timeframe}): (
|
||||||
avg:azure.apimanagement_service.other_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
avg:azure.apimanagement_service.other_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
||||||
avg:azure.apimanagement_service.total_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() * 100
|
avg:azure.apimanagement_service.total_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() * 100
|
||||||
) > ${var.other_requests_threshold_critical}
|
) > ${var.other_requests_threshold_critical}
|
||||||
@ -106,7 +106,7 @@ resource "datadog_monitor" "apimgt_unauthorized_requests" {
|
|||||||
message = "${coalesce(var.unauthorized_requests_message, var.message)}"
|
message = "${coalesce(var.unauthorized_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.unauthorized_requests_timeframe}): (
|
||||||
avg:azure.apimanagement_service.unauthorized_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
avg:azure.apimanagement_service.unauthorized_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
||||||
avg:azure.apimanagement_service.total_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() * 100
|
avg:azure.apimanagement_service.total_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() * 100
|
||||||
) > ${var.unauthorized_requests_threshold_critical}
|
) > ${var.unauthorized_requests_threshold_critical}
|
||||||
@ -138,7 +138,7 @@ resource "datadog_monitor" "apimgt_successful_requests" {
|
|||||||
message = "${coalesce(var.successful_requests_message, var.message)}"
|
message = "${coalesce(var.successful_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.successful_requests_timeframe}): (
|
||||||
avg:azure.apimanagement_service.successful_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
avg:azure.apimanagement_service.successful_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
||||||
avg:azure.apimanagement_service.total_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() * 100
|
avg:azure.apimanagement_service.total_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() * 100
|
||||||
) < ${var.successful_requests_threshold_critical}
|
) < ${var.successful_requests_threshold_critical}
|
||||||
|
|||||||
@ -36,23 +36,28 @@ Inputs
|
|||||||
| http_4xx_requests_silenced | Groups to mute for App Services 4xx requests monitor | map | `<map>` | no |
|
| http_4xx_requests_silenced | Groups to mute for App Services 4xx requests monitor | map | `<map>` | no |
|
||||||
| http_4xx_requests_threshold_critical | Maximum critical acceptable percent of 4xx errors | string | `90` | no |
|
| http_4xx_requests_threshold_critical | Maximum critical acceptable percent of 4xx errors | string | `90` | no |
|
||||||
| http_4xx_requests_threshold_warning | Warning regarding acceptable percent of 4xx errors | string | `50` | no |
|
| http_4xx_requests_threshold_warning | Warning regarding acceptable percent of 4xx errors | string | `50` | no |
|
||||||
|
| http_4xx_requests_timeframe | Monitor timeframe for App Services 4xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| http_5xx_requests_message | Custom message for App Services 5xx requests monitor | string | `` | no |
|
| http_5xx_requests_message | Custom message for App Services 5xx requests monitor | string | `` | no |
|
||||||
| http_5xx_requests_silenced | Groups to mute for App Services 5xx requests monitor | map | `<map>` | no |
|
| http_5xx_requests_silenced | Groups to mute for App Services 5xx requests monitor | map | `<map>` | no |
|
||||||
| http_5xx_requests_threshold_critical | Maximum critical acceptable percent of 5xx errors | string | `90` | no |
|
| http_5xx_requests_threshold_critical | Maximum critical acceptable percent of 5xx errors | string | `90` | no |
|
||||||
| http_5xx_requests_threshold_warning | Warning regarding acceptable percent of 5xx errors | string | `50` | no |
|
| http_5xx_requests_threshold_warning | Warning regarding acceptable percent of 5xx errors | string | `50` | no |
|
||||||
|
| http_5xx_requests_timeframe | Monitor timeframe for App Services 5xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| http_successful_requests_message | Custom message for App Services successful requests monitor | string | `` | no |
|
| http_successful_requests_message | Custom message for App Services successful requests monitor | string | `` | no |
|
||||||
| http_successful_requests_silenced | Groups to mute for App Services successful requests monitor | map | `<map>` | no |
|
| http_successful_requests_silenced | Groups to mute for App Services successful requests monitor | map | `<map>` | no |
|
||||||
| http_successful_requests_threshold_critical | Minimum critical acceptable percent of 2xx & 3xx requests | string | `10` | no |
|
| http_successful_requests_threshold_critical | Minimum critical acceptable percent of 2xx & 3xx requests | string | `10` | no |
|
||||||
| http_successful_requests_threshold_warning | Warning regarding acceptable percent of 2xx & 3xx requests | string | `30` | no |
|
| http_successful_requests_threshold_warning | Warning regarding acceptable percent of 2xx & 3xx requests | string | `30` | no |
|
||||||
|
| http_successful_requests_timeframe | Monitor timeframe for App Services successful requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| memory_usage_message | Custom message for App Services memory usage monitor | string | `` | no |
|
| memory_usage_message | Custom message for App Services memory usage monitor | string | `` | no |
|
||||||
| memory_usage_silenced | Groups to mute for App Services memory usage monitor | map | `<map>` | no |
|
| memory_usage_silenced | Groups to mute for App Services memory usage monitor | map | `<map>` | no |
|
||||||
| memory_usage_threshold_critical | Alerting threshold in Mib | string | `1073741824` | no |
|
| memory_usage_threshold_critical | Alerting threshold in Mib | string | `1073741824` | no |
|
||||||
| memory_usage_threshold_warning | Warning threshold in MiB | string | `536870912` | no |
|
| memory_usage_threshold_warning | Warning threshold in MiB | string | `536870912` | no |
|
||||||
|
| memory_usage_timeframe | Monitor timeframe for App Services memory usage [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| message | Message sent when a monitor is triggered | string | - | yes |
|
| message | Message sent when a monitor is triggered | string | - | yes |
|
||||||
| response_time_message | Custom message for App Services response time monitor | string | `` | no |
|
| response_time_message | Custom message for App Services response time monitor | string | `` | no |
|
||||||
| response_time_silenced | Groups to mute for App Services response time monitor | map | `<map>` | no |
|
| response_time_silenced | Groups to mute for App Services response time monitor | map | `<map>` | no |
|
||||||
| response_time_threshold_critical | Alerting threshold for response time in seconds | string | `10` | no |
|
| response_time_threshold_critical | Alerting threshold for response time in seconds | string | `10` | no |
|
||||||
| response_time_threshold_warning | Warning threshold for response time in seconds | string | `5` | no |
|
| response_time_threshold_warning | Warning threshold for response time in seconds | string | `5` | no |
|
||||||
|
| response_time_timeframe | Monitor timeframe for App Services response time [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
|
|
||||||
Related documentation
|
Related documentation
|
||||||
---------------------
|
---------------------
|
||||||
|
|||||||
@ -35,6 +35,12 @@ variable "response_time_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "response_time_timeframe" {
|
||||||
|
description = "Monitor timeframe for App Services response time [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "response_time_threshold_critical" {
|
variable "response_time_threshold_critical" {
|
||||||
default = 10
|
default = 10
|
||||||
description = "Alerting threshold for response time in seconds"
|
description = "Alerting threshold for response time in seconds"
|
||||||
@ -57,6 +63,12 @@ variable "memory_usage_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "memory_usage_timeframe" {
|
||||||
|
description = "Monitor timeframe for App Services memory usage [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "memory_usage_threshold_critical" {
|
variable "memory_usage_threshold_critical" {
|
||||||
default = 1073741824 # 1Gb
|
default = 1073741824 # 1Gb
|
||||||
description = "Alerting threshold in Mib"
|
description = "Alerting threshold in Mib"
|
||||||
@ -79,6 +91,12 @@ variable "http_4xx_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "http_4xx_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for App Services 4xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "http_4xx_requests_threshold_critical" {
|
variable "http_4xx_requests_threshold_critical" {
|
||||||
default = 90
|
default = 90
|
||||||
description = "Maximum critical acceptable percent of 4xx errors"
|
description = "Maximum critical acceptable percent of 4xx errors"
|
||||||
@ -101,6 +119,12 @@ variable "http_5xx_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "http_5xx_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for App Services 5xx requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "http_5xx_requests_threshold_critical" {
|
variable "http_5xx_requests_threshold_critical" {
|
||||||
default = 90
|
default = 90
|
||||||
description = "Maximum critical acceptable percent of 5xx errors"
|
description = "Maximum critical acceptable percent of 5xx errors"
|
||||||
@ -123,6 +147,12 @@ variable "http_successful_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "http_successful_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for App Services successful requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "http_successful_requests_threshold_critical" {
|
variable "http_successful_requests_threshold_critical" {
|
||||||
default = 10
|
default = 10
|
||||||
description = "Minimum critical acceptable percent of 2xx & 3xx requests"
|
description = "Minimum critical acceptable percent of 2xx & 3xx requests"
|
||||||
|
|||||||
@ -44,7 +44,7 @@ resource "datadog_monitor" "appservices_memory_usage_count" {
|
|||||||
message = "${coalesce(var.memory_usage_message, var.message)}"
|
message = "${coalesce(var.memory_usage_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (
|
avg(${var.memory_usage_timeframe}): (
|
||||||
avg:azure.app_services.memory_working_set{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
avg:azure.app_services.memory_working_set{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.memory_usage_threshold_critical}
|
) > ${var.memory_usage_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -75,7 +75,7 @@ resource "datadog_monitor" "appservices_http_5xx_errors_count" {
|
|||||||
message = "${coalesce(var.http_5xx_requests_message, var.message)}"
|
message = "${coalesce(var.http_5xx_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.http_5xx_requests_timeframe}): (
|
||||||
avg:azure.app_services.http5xx{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
avg:azure.app_services.http5xx{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
||||||
avg:azure.app_services.requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
avg:azure.app_services.requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
||||||
) * 100 > ${var.http_5xx_requests_threshold_critical}
|
) * 100 > ${var.http_5xx_requests_threshold_critical}
|
||||||
@ -107,7 +107,7 @@ resource "datadog_monitor" "appservices_http_4xx_errors_count" {
|
|||||||
message = "${coalesce(var.http_4xx_requests_message, var.message)}"
|
message = "${coalesce(var.http_4xx_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.http_4xx_requests_timeframe}): (
|
||||||
avg:azure.app_services.http4xx{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
avg:azure.app_services.http4xx{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
||||||
avg:azure.app_services.requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
avg:azure.app_services.requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
||||||
) * 100 > ${var.http_4xx_requests_threshold_critical}
|
) * 100 > ${var.http_4xx_requests_threshold_critical}
|
||||||
@ -139,7 +139,7 @@ resource "datadog_monitor" "appservices_http_success_status_rate" {
|
|||||||
message = "${coalesce(var.http_successful_requests_message, var.message)}"
|
message = "${coalesce(var.http_successful_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.http_successful_requests_timeframe}): (
|
||||||
(avg:azure.app_services.http2xx{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() +
|
(avg:azure.app_services.http2xx{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.app_services.http3xx{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()) /
|
avg:azure.app_services.http3xx{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()) /
|
||||||
avg:azure.app_services.requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
avg:azure.app_services.requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
||||||
|
|||||||
@ -32,15 +32,18 @@ Inputs
|
|||||||
| errors_rate_silenced | Groups to mute for Event Hub errors monitor | map | `<map>` | no |
|
| errors_rate_silenced | Groups to mute for Event Hub errors monitor | map | `<map>` | no |
|
||||||
| errors_rate_thresold_critical | Errors ratio (percentage) to trigger the critical alert | string | `90` | no |
|
| errors_rate_thresold_critical | Errors ratio (percentage) to trigger the critical alert | string | `90` | no |
|
||||||
| errors_rate_thresold_warning | Errors ratio (percentage) to trigger a warning alert | string | `50` | no |
|
| errors_rate_thresold_warning | Errors ratio (percentage) to trigger a warning alert | string | `50` | no |
|
||||||
|
| errors_rate_timeframe | Monitor timeframe for Event Hub errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| failed_requests_rate_message | Custom message for Event Hub failed requests monitor | string | `` | no |
|
| failed_requests_rate_message | Custom message for Event Hub failed requests monitor | string | `` | no |
|
||||||
| failed_requests_rate_silenced | Groups to mute for Event Hub failed requests monitor | map | `<map>` | no |
|
| failed_requests_rate_silenced | Groups to mute for Event Hub failed requests monitor | map | `<map>` | no |
|
||||||
| failed_requests_rate_thresold_critical | Failed requests ratio (percentage) to trigger the critical alert | string | `90` | no |
|
| failed_requests_rate_thresold_critical | Failed requests ratio (percentage) to trigger the critical alert | string | `90` | no |
|
||||||
| failed_requests_rate_thresold_warning | Failed requests ratio (percentage) to trigger a warning alert | string | `50` | no |
|
| failed_requests_rate_thresold_warning | Failed requests ratio (percentage) to trigger a warning alert | string | `50` | no |
|
||||||
|
| failed_requests_rate_timeframe | Monitor timeframe for Event Hub failed requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| message | Message sent when an alert is triggered | string | - | yes |
|
| message | Message sent when an alert is triggered | string | - | yes |
|
||||||
| status_message | Custom message for Event Hub status monitor | string | `` | no |
|
| status_message | Custom message for Event Hub status monitor | string | `` | no |
|
||||||
| status_silenced | Groups to mute for Event Hub status monitor | map | `<map>` | no |
|
| status_silenced | Groups to mute for Event Hub status monitor | map | `<map>` | no |
|
||||||
|
| status_timeframe | Monitor timeframe for Event Hub status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
|
|
||||||
Related documentation
|
Related documentation
|
||||||
---------------------
|
---------------------
|
||||||
|
|||||||
@ -37,6 +37,12 @@ variable "status_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "status_timeframe" {
|
||||||
|
description = "Monitor timeframe for Event Hub status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_requests_rate_silenced" {
|
variable "failed_requests_rate_silenced" {
|
||||||
description = "Groups to mute for Event Hub failed requests monitor"
|
description = "Groups to mute for Event Hub failed requests monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -49,6 +55,12 @@ variable "failed_requests_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_requests_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for Event Hub failed requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_requests_rate_thresold_critical" {
|
variable "failed_requests_rate_thresold_critical" {
|
||||||
description = "Failed requests ratio (percentage) to trigger the critical alert"
|
description = "Failed requests ratio (percentage) to trigger the critical alert"
|
||||||
default = 90
|
default = 90
|
||||||
@ -71,6 +83,12 @@ variable "errors_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "errors_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for Event Hub errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "errors_rate_thresold_critical" {
|
variable "errors_rate_thresold_critical" {
|
||||||
description = "Errors ratio (percentage) to trigger the critical alert"
|
description = "Errors ratio (percentage) to trigger the critical alert"
|
||||||
default = 90
|
default = 90
|
||||||
|
|||||||
@ -11,7 +11,7 @@ resource "datadog_monitor" "eventhub_status" {
|
|||||||
message = "${coalesce(var.status_message, var.message)}"
|
message = "${coalesce(var.status_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): avg:azure.eventhub_namespaces.status{${data.template_file.filter.rendered}} by {resource_group,region,name} != 1
|
avg(${var.status_timeframe}): avg:azure.eventhub_namespaces.status{${data.template_file.filter.rendered}} by {resource_group,region,name} != 1
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
@ -36,7 +36,7 @@ resource "datadog_monitor" "eventhub_failed_requests" {
|
|||||||
message = "${coalesce(var.failed_requests_rate_message, var.message)}"
|
message = "${coalesce(var.failed_requests_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.failed_requests_rate_timeframe}): (
|
||||||
default(
|
default(
|
||||||
avg:azure.eventhub_namespaces.failed_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
avg:azure.eventhub_namespaces.failed_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
||||||
avg:azure.eventhub_namespaces.incoming_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count(),
|
avg:azure.eventhub_namespaces.incoming_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count(),
|
||||||
@ -71,7 +71,7 @@ resource "datadog_monitor" "eventhub_errors" {
|
|||||||
message = "${coalesce(var.errors_rate_message, var.message)}"
|
message = "${coalesce(var.errors_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.errors_rate_timeframe}): (
|
||||||
default(
|
default(
|
||||||
(
|
(
|
||||||
avg:azure.eventhub_namespaces.internal_server_errors{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() +
|
avg:azure.eventhub_namespaces.internal_server_errors{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() +
|
||||||
|
|||||||
@ -43,55 +43,69 @@ Inputs
|
|||||||
| dropped_d2c_telemetry_egress_rate_threshold_critical | D2C Telemetry Dropped limit (critical threshold) | string | `90` | no |
|
| dropped_d2c_telemetry_egress_rate_threshold_critical | D2C Telemetry Dropped limit (critical threshold) | string | `90` | no |
|
||||||
| dropped_d2c_telemetry_egress_rate_threshold_warning | D2C Telemetry Dropped limit (warning threshold) | string | `50` | no |
|
| dropped_d2c_telemetry_egress_rate_threshold_warning | D2C Telemetry Dropped limit (warning threshold) | string | `50` | no |
|
||||||
| dropped_d2c_telemetry_egress_silenced | Groups to mute for IoT Hub dropped d2c telemetry monitor | map | `<map>` | no |
|
| dropped_d2c_telemetry_egress_silenced | Groups to mute for IoT Hub dropped d2c telemetry monitor | map | `<map>` | no |
|
||||||
|
| dropped_d2c_telemetry_egress_timeframe | Monitor timeframe for IoT Hub dropped d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| environment | Architecture Environment | string | - | yes |
|
| environment | Architecture Environment | string | - | yes |
|
||||||
| failed_c2d_methods_rate_message | Custom message for IoT Hub failed c2d method monitor | string | `` | no |
|
| failed_c2d_methods_rate_message | Custom message for IoT Hub failed c2d method monitor | string | `` | no |
|
||||||
| failed_c2d_methods_rate_silenced | Groups to mute for IoT Hub failed c2d methods monitor | map | `<map>` | no |
|
| failed_c2d_methods_rate_silenced | Groups to mute for IoT Hub failed c2d methods monitor | map | `<map>` | no |
|
||||||
| failed_c2d_methods_rate_threshold_critical | C2D Methods Failed rate limit (critical threshold) | string | `90` | no |
|
| failed_c2d_methods_rate_threshold_critical | C2D Methods Failed rate limit (critical threshold) | string | `90` | no |
|
||||||
| failed_c2d_methods_rate_threshold_warning | C2D Methods Failed rate limit (warning threshold) | string | `50` | no |
|
| failed_c2d_methods_rate_threshold_warning | C2D Methods Failed rate limit (warning threshold) | string | `50` | no |
|
||||||
|
| failed_c2d_methods_rate_timeframe | Monitor timeframe for IoT Hub failed c2d method [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| failed_c2d_twin_read_rate_message | Custom message for IoT Hub failed c2d twin read monitor | string | `` | no |
|
| failed_c2d_twin_read_rate_message | Custom message for IoT Hub failed c2d twin read monitor | string | `` | no |
|
||||||
| failed_c2d_twin_read_rate_silenced | Groups to mute for IoT Hub failed c2d twin read monitor | map | `<map>` | no |
|
| failed_c2d_twin_read_rate_silenced | Groups to mute for IoT Hub failed c2d twin read monitor | map | `<map>` | no |
|
||||||
| failed_c2d_twin_read_rate_threshold_critical | C2D Twin Read Failed rate limit (critical threshold) | string | `90` | no |
|
| failed_c2d_twin_read_rate_threshold_critical | C2D Twin Read Failed rate limit (critical threshold) | string | `90` | no |
|
||||||
| failed_c2d_twin_read_rate_threshold_warning | C2D Twin Read Failed rate limit (warning threshold) | string | `50` | no |
|
| failed_c2d_twin_read_rate_threshold_warning | C2D Twin Read Failed rate limit (warning threshold) | string | `50` | no |
|
||||||
|
| failed_c2d_twin_read_rate_timeframe | Monitor timeframe for IoT Hub failed c2d twin read [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| failed_c2d_twin_update_rate_message | Custom message for IoT Hub failed c2d twin update monitor | string | `` | no |
|
| failed_c2d_twin_update_rate_message | Custom message for IoT Hub failed c2d twin update monitor | string | `` | no |
|
||||||
| failed_c2d_twin_update_rate_silenced | Groups to mute for IoT Hub failed c2d twin update monitor | map | `<map>` | no |
|
| failed_c2d_twin_update_rate_silenced | Groups to mute for IoT Hub failed c2d twin update monitor | map | `<map>` | no |
|
||||||
| failed_c2d_twin_update_rate_threshold_critical | C2D Twin Update Failed rate limit (critical threshold) | string | `90` | no |
|
| failed_c2d_twin_update_rate_threshold_critical | C2D Twin Update Failed rate limit (critical threshold) | string | `90` | no |
|
||||||
| failed_c2d_twin_update_rate_threshold_warning | C2D Twin Update Failed rate limit (warning threshold) | string | `50` | no |
|
| failed_c2d_twin_update_rate_threshold_warning | C2D Twin Update Failed rate limit (warning threshold) | string | `50` | no |
|
||||||
|
| failed_c2d_twin_update_rate_timeframe | Monitor timeframe for IoT Hub failed c2d twin update [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| failed_d2c_twin_read_rate_message | Custom message for IoT Hub failed d2c twin read monitor | string | `` | no |
|
| failed_d2c_twin_read_rate_message | Custom message for IoT Hub failed d2c twin read monitor | string | `` | no |
|
||||||
| failed_d2c_twin_read_rate_silenced | Groups to mute for IoT Hub failed d2c twin read monitor | map | `<map>` | no |
|
| failed_d2c_twin_read_rate_silenced | Groups to mute for IoT Hub failed d2c twin read monitor | map | `<map>` | no |
|
||||||
| failed_d2c_twin_read_rate_threshold_critical | D2C Twin Read Failed rate limit (critical threshold) | string | `90` | no |
|
| failed_d2c_twin_read_rate_threshold_critical | D2C Twin Read Failed rate limit (critical threshold) | string | `90` | no |
|
||||||
| failed_d2c_twin_read_rate_threshold_warning | D2C Twin Read Failed rate limit (warning threshold) | string | `50` | no |
|
| failed_d2c_twin_read_rate_threshold_warning | D2C Twin Read Failed rate limit (warning threshold) | string | `50` | no |
|
||||||
|
| failed_d2c_twin_read_rate_timeframe | Monitor timeframe for IoT Hub failed d2c twin read [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| failed_d2c_twin_update_rate_message | Custom message for IoT Hub failed d2c twin update monitor | string | `` | no |
|
| failed_d2c_twin_update_rate_message | Custom message for IoT Hub failed d2c twin update monitor | string | `` | no |
|
||||||
| failed_d2c_twin_update_rate_silenced | Groups to mute for IoT Hub failed d2c twin update monitor | map | `<map>` | no |
|
| failed_d2c_twin_update_rate_silenced | Groups to mute for IoT Hub failed d2c twin update monitor | map | `<map>` | no |
|
||||||
| failed_d2c_twin_update_rate_threshold_critical | D2C Twin Update Failed rate limit (critical threshold) | string | `90` | no |
|
| failed_d2c_twin_update_rate_threshold_critical | D2C Twin Update Failed rate limit (critical threshold) | string | `90` | no |
|
||||||
| failed_d2c_twin_update_rate_threshold_warning | D2C Twin Update Failed rate limit (warning threshold) | string | `50` | no |
|
| failed_d2c_twin_update_rate_threshold_warning | D2C Twin Update Failed rate limit (warning threshold) | string | `50` | no |
|
||||||
|
| failed_d2c_twin_update_rate_timeframe | Monitor timeframe for IoT Hub failed d2c twin update [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| failed_jobs_rate_message | Custom message for IoT Hub failed jobs monitor | string | `` | no |
|
| failed_jobs_rate_message | Custom message for IoT Hub failed jobs monitor | string | `` | no |
|
||||||
| failed_jobs_rate_silenced | Groups to mute for IoT Hub failed jobs monitor | map | `<map>` | no |
|
| failed_jobs_rate_silenced | Groups to mute for IoT Hub failed jobs monitor | map | `<map>` | no |
|
||||||
| failed_jobs_rate_threshold_critical | Jobs Failed rate limit (critical threshold) | string | `90` | no |
|
| failed_jobs_rate_threshold_critical | Jobs Failed rate limit (critical threshold) | string | `90` | no |
|
||||||
| failed_jobs_rate_threshold_warning | Jobs Failed rate limit (warning threshold) | string | `50` | no |
|
| failed_jobs_rate_threshold_warning | Jobs Failed rate limit (warning threshold) | string | `50` | no |
|
||||||
|
| failed_jobs_rate_timeframe | Monitor timeframe for IoT Hub failed jobs [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| failed_listjobs_rate_message | Custom message for IoT Hub failed list jobs monitor | string | `` | no |
|
| failed_listjobs_rate_message | Custom message for IoT Hub failed list jobs monitor | string | `` | no |
|
||||||
| failed_listjobs_rate_silenced | Groups to mute for IoT Hub failed list jobs monitor | map | `<map>` | no |
|
| failed_listjobs_rate_silenced | Groups to mute for IoT Hub failed list jobs monitor | map | `<map>` | no |
|
||||||
| failed_listjobs_rate_threshold_critical | ListJobs Failed rate limit (critical threshold) | string | `90` | no |
|
| failed_listjobs_rate_threshold_critical | ListJobs Failed rate limit (critical threshold) | string | `90` | no |
|
||||||
| failed_listjobs_rate_threshold_warning | ListJobs Failed rate limit (warning threshold) | string | `50` | no |
|
| failed_listjobs_rate_threshold_warning | ListJobs Failed rate limit (warning threshold) | string | `50` | no |
|
||||||
|
| failed_listjobs_rate_timeframe | Monitor timeframe for IoT Hub failed list jobs [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| failed_queryjobs_rate_message | Custom message for IoT Hub failed query jobs monitor | string | `` | no |
|
| failed_queryjobs_rate_message | Custom message for IoT Hub failed query jobs monitor | string | `` | no |
|
||||||
| failed_queryjobs_rate_silenced | Groups to mute for IoT Hub failed query jobs monitor | map | `<map>` | no |
|
| failed_queryjobs_rate_silenced | Groups to mute for IoT Hub failed query jobs monitor | map | `<map>` | no |
|
||||||
| failed_queryjobs_rate_threshold_critical | QueryJobs Failed rate limit (critical threshold) | string | `90` | no |
|
| failed_queryjobs_rate_threshold_critical | QueryJobs Failed rate limit (critical threshold) | string | `90` | no |
|
||||||
| failed_queryjobs_rate_threshold_warning | QueryJobs Failed rate limit (warning threshold) | string | `50` | no |
|
| failed_queryjobs_rate_threshold_warning | QueryJobs Failed rate limit (warning threshold) | string | `50` | no |
|
||||||
|
| failed_queryjobs_rate_timeframe | Monitor timeframe for IoT Hub failed query jobs [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| filter_tags | Tags used for filtering | string | `*` | no |
|
| filter_tags | Tags used for filtering | string | `*` | no |
|
||||||
| invalid_d2c_telemetry_egress_message | Custom message for IoT Hub invalid d2c telemetry monitor | string | `` | no |
|
| invalid_d2c_telemetry_egress_message | Custom message for IoT Hub invalid d2c telemetry monitor | string | `` | no |
|
||||||
| invalid_d2c_telemetry_egress_rate_threshold_critical | D2C Telemetry Invalid limit (critical threshold) | string | `90` | no |
|
| invalid_d2c_telemetry_egress_rate_threshold_critical | D2C Telemetry Invalid limit (critical threshold) | string | `90` | no |
|
||||||
| invalid_d2c_telemetry_egress_rate_threshold_warning | D2C Telemetry Invalid limit (warning threshold) | string | `50` | no |
|
| invalid_d2c_telemetry_egress_rate_threshold_warning | D2C Telemetry Invalid limit (warning threshold) | string | `50` | no |
|
||||||
| invalid_d2c_telemetry_egress_silenced | Groups to mute for IoT Hub invalid d2c telemetry monitor | map | `<map>` | no |
|
| invalid_d2c_telemetry_egress_silenced | Groups to mute for IoT Hub invalid d2c telemetry monitor | map | `<map>` | no |
|
||||||
|
| invalid_d2c_telemetry_egress_timeframe | Monitor timeframe for IoT Hub invalid d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| message | Message sent when an alert is triggered | string | - | yes |
|
| message | Message sent when an alert is triggered | string | - | yes |
|
||||||
| orphaned_d2c_telemetry_egress_message | Custom message for IoT Hub orphaned d2c telemetry monitor | string | `` | no |
|
| orphaned_d2c_telemetry_egress_message | Custom message for IoT Hub orphaned d2c telemetry monitor | string | `` | no |
|
||||||
| orphaned_d2c_telemetry_egress_rate_threshold_critical | D2C Telemetry Orphaned limit (critical threshold) | string | `90` | no |
|
| orphaned_d2c_telemetry_egress_rate_threshold_critical | D2C Telemetry Orphaned limit (critical threshold) | string | `90` | no |
|
||||||
| orphaned_d2c_telemetry_egress_rate_threshold_warning | D2C Telemetry Orphaned limit (warning threshold) | string | `50` | no |
|
| orphaned_d2c_telemetry_egress_rate_threshold_warning | D2C Telemetry Orphaned limit (warning threshold) | string | `50` | no |
|
||||||
| orphaned_d2c_telemetry_egress_silenced | Groups to mute for IoT Hub orphaned d2c telemetry monitor | map | `<map>` | no |
|
| orphaned_d2c_telemetry_egress_silenced | Groups to mute for IoT Hub orphaned d2c telemetry monitor | map | `<map>` | no |
|
||||||
|
| orphaned_d2c_telemetry_egress_timeframe | Monitor timeframe for IoT Hub orphaned d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| status_message | Custom message for IoT Hub status monitor | string | `` | no |
|
| status_message | Custom message for IoT Hub status monitor | string | `` | no |
|
||||||
| status_silenced | Groups to mute for IoT Hub status monitor | map | `<map>` | no |
|
| status_silenced | Groups to mute for IoT Hub status monitor | map | `<map>` | no |
|
||||||
|
| status_timeframe | Monitor timeframe for IoT Hub status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| too_many_d2c_telemetry_ingress_nosent_message | Custom message for IoT Hub unsent d2c telemetry monitor | string | `` | no |
|
| too_many_d2c_telemetry_ingress_nosent_message | Custom message for IoT Hub unsent d2c telemetry monitor | string | `` | no |
|
||||||
| too_many_d2c_telemetry_ingress_nosent_silenced | Groups to mute for IoT Hub unsent d2c telemetry monitor | map | `<map>` | no |
|
| too_many_d2c_telemetry_ingress_nosent_silenced | Groups to mute for IoT Hub unsent d2c telemetry monitor | map | `<map>` | no |
|
||||||
|
| too_many_d2c_telemetry_ingress_nosent_timeframe | Monitor timeframe for IoT Hub unsent d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| total_devices_message | Custom message for IoT Hub total devices monitor | string | `` | no |
|
| total_devices_message | Custom message for IoT Hub total devices monitor | string | `` | no |
|
||||||
| total_devices_silenced | Groups to mute for IoT Hub total devices monitor | map | `<map>` | no |
|
| total_devices_silenced | Groups to mute for IoT Hub total devices monitor | map | `<map>` | no |
|
||||||
|
| total_devices_timeframe | Monitor timeframe for IoT Hub total devices [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
|
|
||||||
Related documentation
|
Related documentation
|
||||||
---------------------
|
---------------------
|
||||||
|
|||||||
@ -32,6 +32,12 @@ variable "status_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "status_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "total_devices_silenced" {
|
variable "total_devices_silenced" {
|
||||||
description = "Groups to mute for IoT Hub total devices monitor"
|
description = "Groups to mute for IoT Hub total devices monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -44,6 +50,12 @@ variable "total_devices_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "total_devices_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub total devices [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "too_many_d2c_telemetry_ingress_nosent_silenced" {
|
variable "too_many_d2c_telemetry_ingress_nosent_silenced" {
|
||||||
description = "Groups to mute for IoT Hub unsent d2c telemetry monitor"
|
description = "Groups to mute for IoT Hub unsent d2c telemetry monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -56,6 +68,12 @@ variable "too_many_d2c_telemetry_ingress_nosent_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "too_many_d2c_telemetry_ingress_nosent_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub unsent d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_jobs_rate_silenced" {
|
variable "failed_jobs_rate_silenced" {
|
||||||
description = "Groups to mute for IoT Hub failed jobs monitor"
|
description = "Groups to mute for IoT Hub failed jobs monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -68,6 +86,12 @@ variable "failed_jobs_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_jobs_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub failed jobs [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_jobs_rate_threshold_warning" {
|
variable "failed_jobs_rate_threshold_warning" {
|
||||||
description = "Jobs Failed rate limit (warning threshold)"
|
description = "Jobs Failed rate limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -90,6 +114,12 @@ variable "failed_listjobs_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_listjobs_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub failed list jobs [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_listjobs_rate_threshold_warning" {
|
variable "failed_listjobs_rate_threshold_warning" {
|
||||||
description = "ListJobs Failed rate limit (warning threshold)"
|
description = "ListJobs Failed rate limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -112,6 +142,12 @@ variable "failed_queryjobs_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_queryjobs_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub failed query jobs [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_queryjobs_rate_threshold_warning" {
|
variable "failed_queryjobs_rate_threshold_warning" {
|
||||||
description = "QueryJobs Failed rate limit (warning threshold)"
|
description = "QueryJobs Failed rate limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -134,6 +170,12 @@ variable "failed_c2d_methods_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_c2d_methods_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub failed c2d method [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_c2d_methods_rate_threshold_warning" {
|
variable "failed_c2d_methods_rate_threshold_warning" {
|
||||||
description = "C2D Methods Failed rate limit (warning threshold)"
|
description = "C2D Methods Failed rate limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -156,6 +198,12 @@ variable "failed_c2d_twin_read_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_c2d_twin_read_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub failed c2d twin read [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_c2d_twin_read_rate_threshold_warning" {
|
variable "failed_c2d_twin_read_rate_threshold_warning" {
|
||||||
description = "C2D Twin Read Failed rate limit (warning threshold)"
|
description = "C2D Twin Read Failed rate limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -178,6 +226,12 @@ variable "failed_c2d_twin_update_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_c2d_twin_update_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub failed c2d twin update [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_c2d_twin_update_rate_threshold_warning" {
|
variable "failed_c2d_twin_update_rate_threshold_warning" {
|
||||||
description = "C2D Twin Update Failed rate limit (warning threshold)"
|
description = "C2D Twin Update Failed rate limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -200,6 +254,12 @@ variable "failed_d2c_twin_read_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_d2c_twin_read_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub failed d2c twin read [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_d2c_twin_read_rate_threshold_warning" {
|
variable "failed_d2c_twin_read_rate_threshold_warning" {
|
||||||
description = "D2C Twin Read Failed rate limit (warning threshold)"
|
description = "D2C Twin Read Failed rate limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -222,6 +282,12 @@ variable "failed_d2c_twin_update_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_d2c_twin_update_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub failed d2c twin update [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_d2c_twin_update_rate_threshold_warning" {
|
variable "failed_d2c_twin_update_rate_threshold_warning" {
|
||||||
description = "D2C Twin Update Failed rate limit (warning threshold)"
|
description = "D2C Twin Update Failed rate limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -244,6 +310,12 @@ variable "dropped_d2c_telemetry_egress_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "dropped_d2c_telemetry_egress_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub dropped d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "dropped_d2c_telemetry_egress_rate_threshold_warning" {
|
variable "dropped_d2c_telemetry_egress_rate_threshold_warning" {
|
||||||
description = "D2C Telemetry Dropped limit (warning threshold)"
|
description = "D2C Telemetry Dropped limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -266,6 +338,12 @@ variable "orphaned_d2c_telemetry_egress_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "orphaned_d2c_telemetry_egress_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub orphaned d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "orphaned_d2c_telemetry_egress_rate_threshold_warning" {
|
variable "orphaned_d2c_telemetry_egress_rate_threshold_warning" {
|
||||||
description = "D2C Telemetry Orphaned limit (warning threshold)"
|
description = "D2C Telemetry Orphaned limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
@ -288,6 +366,12 @@ variable "invalid_d2c_telemetry_egress_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "invalid_d2c_telemetry_egress_timeframe" {
|
||||||
|
description = "Monitor timeframe for IoT Hub invalid d2c telemetry [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "invalid_d2c_telemetry_egress_rate_threshold_warning" {
|
variable "invalid_d2c_telemetry_egress_rate_threshold_warning" {
|
||||||
description = "D2C Telemetry Invalid limit (warning threshold)"
|
description = "D2C Telemetry Invalid limit (warning threshold)"
|
||||||
default = 50
|
default = 50
|
||||||
|
|||||||
@ -3,7 +3,7 @@ resource "datadog_monitor" "too_many_jobs_failed" {
|
|||||||
message = "${coalesce(var.failed_jobs_rate_message, var.message)}"
|
message = "${coalesce(var.failed_jobs_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m):(
|
sum(${var.failed_jobs_rate_timeframe}):(
|
||||||
avg:azure.devices_iothubs.jobs.failed{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
avg:azure.devices_iothubs.jobs.failed{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
||||||
( avg:azure.devices_iothubs.jobs.failed{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
( avg:azure.devices_iothubs.jobs.failed{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.jobs.completed{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
avg:azure.devices_iothubs.jobs.completed{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
||||||
@ -37,7 +37,7 @@ resource "datadog_monitor" "too_many_list_jobs_failed" {
|
|||||||
message = "${coalesce(var.failed_listjobs_rate_message, var.message)}"
|
message = "${coalesce(var.failed_listjobs_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m):(
|
sum(${var.failed_listjobs_rate_timeframe}):(
|
||||||
avg:azure.devices_iothubs.jobs.list_jobs.failure{${var.filter_tags}} by {resource_group,name}.as_count() /
|
avg:azure.devices_iothubs.jobs.list_jobs.failure{${var.filter_tags}} by {resource_group,name}.as_count() /
|
||||||
( avg:azure.devices_iothubs.jobs.list_jobs.success{${var.filter_tags}} by {resource_group,name}.as_count() +
|
( avg:azure.devices_iothubs.jobs.list_jobs.success{${var.filter_tags}} by {resource_group,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.jobs.list_jobs.failure{${var.filter_tags}} by {resource_group,name}.as_count() )
|
avg:azure.devices_iothubs.jobs.list_jobs.failure{${var.filter_tags}} by {resource_group,name}.as_count() )
|
||||||
@ -71,7 +71,7 @@ resource "datadog_monitor" "too_many_query_jobs_failed" {
|
|||||||
message = "${coalesce(var.failed_queryjobs_rate_message, var.message)}"
|
message = "${coalesce(var.failed_queryjobs_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m):(
|
sum(${var.failed_queryjobs_rate_timeframe}):(
|
||||||
avg:azure.devices_iothubs.jobs.query_jobs.failure{${var.filter_tags}} by {resource_group,name}.as_count() /
|
avg:azure.devices_iothubs.jobs.query_jobs.failure{${var.filter_tags}} by {resource_group,name}.as_count() /
|
||||||
( avg:azure.devices_iothubs.jobs.query_jobs.success{${var.filter_tags}} by {resource_group,name}.as_count() +
|
( avg:azure.devices_iothubs.jobs.query_jobs.success{${var.filter_tags}} by {resource_group,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.jobs.query_jobs.failure{${var.filter_tags}} by {resource_group,name}.as_count() )
|
avg:azure.devices_iothubs.jobs.query_jobs.failure{${var.filter_tags}} by {resource_group,name}.as_count() )
|
||||||
@ -105,7 +105,7 @@ resource "datadog_monitor" "status" {
|
|||||||
message = "${coalesce(var.status_message, var.message)}"
|
message = "${coalesce(var.status_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m):avg:azure.devices_iothubs.status{${var.filter_tags}} by {resource_group,region,name} < 1
|
avg(${var.status_timeframe}):avg:azure.devices_iothubs.status{${var.filter_tags}} by {resource_group,region,name} < 1
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
@ -130,7 +130,7 @@ resource "datadog_monitor" "total_devices" {
|
|||||||
message = "${coalesce(var.total_devices_message, var.message)}"
|
message = "${coalesce(var.total_devices_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m):avg:azure.devices_iothubs.devices.total_devices{${var.filter_tags}} by {resource_group,region,name} == 0
|
avg(${var.total_devices_timeframe}):avg:azure.devices_iothubs.devices.total_devices{${var.filter_tags}} by {resource_group,region,name} == 0
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
@ -155,7 +155,7 @@ resource "datadog_monitor" "too_many_c2d_methods_failed" {
|
|||||||
message = "${coalesce(var.failed_c2d_methods_rate_message, var.message)}"
|
message = "${coalesce(var.failed_c2d_methods_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m):(
|
sum(${var.failed_c2d_methods_rate_timeframe}):(
|
||||||
avg:azure.devices_iothubs.c2d.methods.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
avg:azure.devices_iothubs.c2d.methods.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
||||||
( avg:azure.devices_iothubs.c2d.methods.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
( avg:azure.devices_iothubs.c2d.methods.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.c2d.methods.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
avg:azure.devices_iothubs.c2d.methods.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
||||||
@ -189,7 +189,7 @@ resource "datadog_monitor" "too_many_c2d_twin_read_failed" {
|
|||||||
message = "${coalesce(var.failed_c2d_twin_read_rate_message, var.message)}"
|
message = "${coalesce(var.failed_c2d_twin_read_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m):(
|
sum(${var.failed_c2d_twin_read_rate_timeframe}):(
|
||||||
avg:azure.devices_iothubs.c2d.twin.read.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
avg:azure.devices_iothubs.c2d.twin.read.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
||||||
( avg:azure.devices_iothubs.c2d.twin.read.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
( avg:azure.devices_iothubs.c2d.twin.read.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.c2d.twin.read.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
avg:azure.devices_iothubs.c2d.twin.read.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
||||||
@ -223,7 +223,7 @@ resource "datadog_monitor" "too_many_c2d_twin_update_failed" {
|
|||||||
message = "${coalesce(var.failed_c2d_twin_update_rate_message, var.message)}"
|
message = "${coalesce(var.failed_c2d_twin_update_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m):(
|
sum(${var.failed_c2d_twin_update_rate_timeframe}):(
|
||||||
avg:azure.devices_iothubs.c2d.twin.update.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
avg:azure.devices_iothubs.c2d.twin.update.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
||||||
( avg:azure.devices_iothubs.c2d.twin.update.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
( avg:azure.devices_iothubs.c2d.twin.update.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.c2d.twin.update.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
avg:azure.devices_iothubs.c2d.twin.update.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
||||||
@ -257,7 +257,7 @@ resource "datadog_monitor" "too_many_d2c_twin_read_failed" {
|
|||||||
message = "${coalesce(var.failed_d2c_twin_read_rate_message, var.message)}"
|
message = "${coalesce(var.failed_d2c_twin_read_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m):(
|
sum(${var.failed_d2c_twin_read_rate_timeframe}):(
|
||||||
avg:azure.devices_iothubs.d2c.twin.read.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
avg:azure.devices_iothubs.d2c.twin.read.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
||||||
( avg:azure.devices_iothubs.d2c.twin.read.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
( avg:azure.devices_iothubs.d2c.twin.read.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.d2c.twin.read.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
avg:azure.devices_iothubs.d2c.twin.read.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
||||||
@ -291,7 +291,7 @@ resource "datadog_monitor" "too_many_d2c_twin_update_failed" {
|
|||||||
message = "${coalesce(var.failed_d2c_twin_update_rate_message, var.message)}"
|
message = "${coalesce(var.failed_d2c_twin_update_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m):(
|
sum(${var.failed_d2c_twin_update_rate_timeframe}):(
|
||||||
avg:azure.devices_iothubs.d2c.twin.update.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
avg:azure.devices_iothubs.d2c.twin.update.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
||||||
( avg:azure.devices_iothubs.d2c.twin.update.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
( avg:azure.devices_iothubs.d2c.twin.update.failure{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.d2c.twin.update.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
avg:azure.devices_iothubs.d2c.twin.update.success{${var.filter_tags}} by {resource_group,region,name}.as_count() )
|
||||||
@ -325,7 +325,7 @@ resource "datadog_monitor" "too_many_d2c_telemetry_egress_dropped" {
|
|||||||
message = "${coalesce(var.dropped_d2c_telemetry_egress_message, var.message)}"
|
message = "${coalesce(var.dropped_d2c_telemetry_egress_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.dropped_d2c_telemetry_egress_timeframe}): (
|
||||||
avg:azure.devices_iothubs.d2c.telemetry.egress.dropped{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
avg:azure.devices_iothubs.d2c.telemetry.egress.dropped{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
||||||
(avg:azure.devices_iothubs.d2c.telemetry.egress.dropped{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
(avg:azure.devices_iothubs.d2c.telemetry.egress.dropped{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.d2c.telemetry.egress.orphaned{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
avg:azure.devices_iothubs.d2c.telemetry.egress.orphaned{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
@ -362,7 +362,7 @@ resource "datadog_monitor" "too_many_d2c_telemetry_egress_orphaned" {
|
|||||||
message = "${coalesce(var.orphaned_d2c_telemetry_egress_message, var.message)}"
|
message = "${coalesce(var.orphaned_d2c_telemetry_egress_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.orphaned_d2c_telemetry_egress_timeframe}): (
|
||||||
avg:azure.devices_iothubs.d2c.telemetry.egress.orphaned{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
avg:azure.devices_iothubs.d2c.telemetry.egress.orphaned{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
||||||
(avg:azure.devices_iothubs.d2c.telemetry.egress.dropped{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
(avg:azure.devices_iothubs.d2c.telemetry.egress.dropped{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.d2c.telemetry.egress.orphaned{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
avg:azure.devices_iothubs.d2c.telemetry.egress.orphaned{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
@ -399,7 +399,7 @@ resource "datadog_monitor" "too_many_d2c_telemetry_egress_invalid" {
|
|||||||
message = "${coalesce(var.invalid_d2c_telemetry_egress_message, var.message)}"
|
message = "${coalesce(var.invalid_d2c_telemetry_egress_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.invalid_d2c_telemetry_egress_timeframe}): (
|
||||||
avg:azure.devices_iothubs.d2c.telemetry.egress.invalid{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
avg:azure.devices_iothubs.d2c.telemetry.egress.invalid{${var.filter_tags}} by {resource_group,region,name}.as_count() /
|
||||||
(avg:azure.devices_iothubs.d2c.telemetry.egress.dropped{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
(avg:azure.devices_iothubs.d2c.telemetry.egress.dropped{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
avg:azure.devices_iothubs.d2c.telemetry.egress.orphaned{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
avg:azure.devices_iothubs.d2c.telemetry.egress.orphaned{${var.filter_tags}} by {resource_group,region,name}.as_count() +
|
||||||
@ -436,7 +436,7 @@ resource "datadog_monitor" "too_many_d2c_telemetry_ingress_nosent" {
|
|||||||
message = "${coalesce(var.too_many_d2c_telemetry_ingress_nosent_message, var.message)}"
|
message = "${coalesce(var.too_many_d2c_telemetry_ingress_nosent_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.too_many_d2c_telemetry_ingress_nosent_timeframe}): (
|
||||||
avg:azure.devices_iothubs.d2c.telemetry.ingress.all_protocol{${var.filter_tags}} by {resource_group,region,name}.as_count() -
|
avg:azure.devices_iothubs.d2c.telemetry.ingress.all_protocol{${var.filter_tags}} by {resource_group,region,name}.as_count() -
|
||||||
avg:azure.devices_iothubs.d2c.telemetry.ingress.success{${var.filter_tags}} by {resource_group,region,name}.as_count()
|
avg:azure.devices_iothubs.d2c.telemetry.ingress.success{${var.filter_tags}} by {resource_group,region,name}.as_count()
|
||||||
) > 0
|
) > 0
|
||||||
|
|||||||
@ -33,6 +33,7 @@ Inputs
|
|||||||
| evictedkeys_limit_silenced | Groups to mute for Redis evicted keys monitor | map | `<map>` | no |
|
| evictedkeys_limit_silenced | Groups to mute for Redis evicted keys monitor | map | `<map>` | no |
|
||||||
| evictedkeys_limit_threshold_critical | Evicted keys limit (critical threshold) | string | `100` | no |
|
| evictedkeys_limit_threshold_critical | Evicted keys limit (critical threshold) | string | `100` | no |
|
||||||
| evictedkeys_limit_threshold_warning | Evicted keys limit (warning threshold) | string | `0` | no |
|
| evictedkeys_limit_threshold_warning | Evicted keys limit (warning threshold) | string | `0` | no |
|
||||||
|
| evictedkeys_limit_timeframe | Monitor timeframe for Redis evicted keys [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| message | Message sent when a Redis monitor is triggered | string | - | yes |
|
| message | Message sent when a Redis monitor is triggered | string | - | yes |
|
||||||
@ -40,12 +41,15 @@ Inputs
|
|||||||
| percent_processor_time_silenced | Groups to mute for Redis processor monitor | map | `<map>` | no |
|
| percent_processor_time_silenced | Groups to mute for Redis processor monitor | map | `<map>` | no |
|
||||||
| percent_processor_time_threshold_critical | Processor time percent (critical threshold) | string | `80` | no |
|
| percent_processor_time_threshold_critical | Processor time percent (critical threshold) | string | `80` | no |
|
||||||
| percent_processor_time_threshold_warning | Processor time percent (warning threshold) | string | `60` | no |
|
| percent_processor_time_threshold_warning | Processor time percent (warning threshold) | string | `60` | no |
|
||||||
|
| percent_processor_time_timeframe | Monitor timeframe for Redis processor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| server_load_rate_message | Custom message for Redis server load monitor | string | `` | no |
|
| server_load_rate_message | Custom message for Redis server load monitor | string | `` | no |
|
||||||
| server_load_rate_silenced | Groups to mute for Redis server load monitor | map | `<map>` | no |
|
| server_load_rate_silenced | Groups to mute for Redis server load monitor | map | `<map>` | no |
|
||||||
| server_load_rate_threshold_critical | Server CPU load rate (critical threshold) | string | `90` | no |
|
| server_load_rate_threshold_critical | Server CPU load rate (critical threshold) | string | `90` | no |
|
||||||
| server_load_rate_threshold_warning | Server CPU load rate (warning threshold) | string | `70` | no |
|
| server_load_rate_threshold_warning | Server CPU load rate (warning threshold) | string | `70` | no |
|
||||||
|
| server_load_rate_timeframe | Monitor timeframe for Redis server load [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| status_message | Custom message for Redis status monitor | string | `` | no |
|
| status_message | Custom message for Redis status monitor | string | `` | no |
|
||||||
| status_silenced | Groups to mute for Redis status monitor | map | `<map>` | no |
|
| status_silenced | Groups to mute for Redis status monitor | map | `<map>` | no |
|
||||||
|
| status_timeframe | Monitor timeframe for Redis status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
|
|
||||||
Related documentation
|
Related documentation
|
||||||
---------------------
|
---------------------
|
||||||
|
|||||||
@ -37,6 +37,12 @@ variable "status_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "status_timeframe" {
|
||||||
|
description = "Monitor timeframe for Redis status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "evictedkeys_limit_silenced" {
|
variable "evictedkeys_limit_silenced" {
|
||||||
description = "Groups to mute for Redis evicted keys monitor"
|
description = "Groups to mute for Redis evicted keys monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -49,6 +55,12 @@ variable "evictedkeys_limit_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "evictedkeys_limit_timeframe" {
|
||||||
|
description = "Monitor timeframe for Redis evicted keys [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "evictedkeys_limit_threshold_warning" {
|
variable "evictedkeys_limit_threshold_warning" {
|
||||||
description = "Evicted keys limit (warning threshold)"
|
description = "Evicted keys limit (warning threshold)"
|
||||||
default = 0
|
default = 0
|
||||||
@ -71,6 +83,12 @@ variable "percent_processor_time_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "percent_processor_time_timeframe" {
|
||||||
|
description = "Monitor timeframe for Redis processor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "percent_processor_time_threshold_critical" {
|
variable "percent_processor_time_threshold_critical" {
|
||||||
description = "Processor time percent (critical threshold)"
|
description = "Processor time percent (critical threshold)"
|
||||||
default = 80
|
default = 80
|
||||||
@ -93,6 +111,12 @@ variable "server_load_rate_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "server_load_rate_timeframe" {
|
||||||
|
description = "Monitor timeframe for Redis server load [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "server_load_rate_threshold_critical" {
|
variable "server_load_rate_threshold_critical" {
|
||||||
description = "Server CPU load rate (critical threshold)"
|
description = "Server CPU load rate (critical threshold)"
|
||||||
default = 90
|
default = 90
|
||||||
|
|||||||
@ -11,7 +11,7 @@ resource "datadog_monitor" "status" {
|
|||||||
message = "${coalesce(var.status_message, var.message)}"
|
message = "${coalesce(var.status_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m):avg:azure.cache_redis.status{${data.template_file.filter.rendered}} by {resource_group,region,name} != 1
|
avg(${var.status_timeframe}):avg:azure.cache_redis.status{${data.template_file.filter.rendered}} by {resource_group,region,name} != 1
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
@ -36,7 +36,7 @@ resource "datadog_monitor" "evictedkeys" {
|
|||||||
message = "${coalesce(var.evictedkeys_limit_message, var.message)}"
|
message = "${coalesce(var.evictedkeys_limit_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (
|
avg(${var.evictedkeys_limit_timeframe}): (
|
||||||
avg:azure.cache_redis.evictedkeys{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
avg:azure.cache_redis.evictedkeys{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.evictedkeys_limit_threshold_critical}
|
) > ${var.evictedkeys_limit_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -68,7 +68,7 @@ resource "datadog_monitor" "percent_processor_time" {
|
|||||||
message = "${coalesce(var.percent_processor_time_message, var.message)}"
|
message = "${coalesce(var.percent_processor_time_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (
|
avg(${var.percent_processor_time_timeframe}): (
|
||||||
avg:azure.cache_redis.percent_processor_time{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
avg:azure.cache_redis.percent_processor_time{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.percent_processor_time_threshold_critical}
|
) > ${var.percent_processor_time_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -100,7 +100,7 @@ resource "datadog_monitor" "server_load" {
|
|||||||
message = "${coalesce(var.server_load_rate_message, var.message)}"
|
message = "${coalesce(var.server_load_rate_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (
|
avg(${var.server_load_rate_timeframe}): (
|
||||||
avg:azure.cache_redis.server_load{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
avg:azure.cache_redis.server_load{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.server_load_rate_threshold_critical}
|
) > ${var.server_load_rate_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
|
|||||||
@ -31,18 +31,22 @@ Inputs
|
|||||||
| cpu_silenced | Groups to mute for SQL CPU monitor | map | `<map>` | no |
|
| cpu_silenced | Groups to mute for SQL CPU monitor | map | `<map>` | no |
|
||||||
| cpu_threshold_critical | CPU usage in percent (critical threshold) | string | `90` | no |
|
| cpu_threshold_critical | CPU usage in percent (critical threshold) | string | `90` | no |
|
||||||
| cpu_threshold_warning | CPU usage in percent (warning threshold) | string | `80` | no |
|
| cpu_threshold_warning | CPU usage in percent (warning threshold) | string | `80` | no |
|
||||||
|
| cpu_timeframe | Monitor timeframe for SQL CPU [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_15m` | no |
|
||||||
| deadlock_message | Custom message for SQL Deadlock monitor | string | `` | no |
|
| deadlock_message | Custom message for SQL Deadlock monitor | string | `` | no |
|
||||||
| deadlock_silenced | Groups to mute for SQL Deadlock monitor | map | `<map>` | no |
|
| deadlock_silenced | Groups to mute for SQL Deadlock monitor | map | `<map>` | no |
|
||||||
| deadlock_threshold_critical | Amount of Deadlocks (critical threshold) | string | `1` | no |
|
| deadlock_threshold_critical | Amount of Deadlocks (critical threshold) | string | `1` | no |
|
||||||
|
| deadlock_timeframe | Monitor timeframe for SQL Deadlock [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
||||||
| diskspace_message | Custom message for SQL disk space monitor | string | `` | no |
|
| diskspace_message | Custom message for SQL disk space monitor | string | `` | no |
|
||||||
| diskspace_silenced | Groups to mute for SQL disk space monitor | map | `<map>` | no |
|
| diskspace_silenced | Groups to mute for SQL disk space monitor | map | `<map>` | no |
|
||||||
| diskspace_threshold_critical | Disk space used in percent (critical threshold) | string | `90` | no |
|
| diskspace_threshold_critical | Disk space used in percent (critical threshold) | string | `90` | no |
|
||||||
| diskspace_threshold_warning | Disk space used in percent (warning threshold) | string | `80` | no |
|
| diskspace_threshold_warning | Disk space used in percent (warning threshold) | string | `80` | no |
|
||||||
|
| diskspace_timeframe | Monitor timeframe for SQL disk space [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_15m` | no |
|
||||||
| dtu_message | Custom message for SQL DTU monitor | string | `` | no |
|
| dtu_message | Custom message for SQL DTU monitor | string | `` | no |
|
||||||
| dtu_silenced | Groups to mute for SQL DTU monitor | map | `<map>` | no |
|
| dtu_silenced | Groups to mute for SQL DTU monitor | map | `<map>` | no |
|
||||||
| dtu_threshold_critical | Amount of DTU used (critical threshold) | string | `90` | no |
|
| dtu_threshold_critical | Amount of DTU used (critical threshold) | string | `90` | no |
|
||||||
| dtu_threshold_warning | Amount of DTU used (warning threshold) | string | `85` | no |
|
| dtu_threshold_warning | Amount of DTU used (warning threshold) | string | `85` | no |
|
||||||
|
| dtu_timeframe | Monitor timeframe for SQL DTU [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_15m` | no |
|
||||||
| environment | Architecture Environment | string | - | yes |
|
| environment | Architecture Environment | string | - | yes |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
|
|||||||
@ -37,6 +37,12 @@ variable "cpu_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "cpu_timeframe" {
|
||||||
|
description = "Monitor timeframe for SQL CPU [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_15m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "cpu_threshold_warning" {
|
variable "cpu_threshold_warning" {
|
||||||
description = "CPU usage in percent (warning threshold)"
|
description = "CPU usage in percent (warning threshold)"
|
||||||
default = "80"
|
default = "80"
|
||||||
@ -59,6 +65,12 @@ variable "diskspace_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "diskspace_timeframe" {
|
||||||
|
description = "Monitor timeframe for SQL disk space [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_15m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "diskspace_threshold_warning" {
|
variable "diskspace_threshold_warning" {
|
||||||
description = "Disk space used in percent (warning threshold)"
|
description = "Disk space used in percent (warning threshold)"
|
||||||
default = "80"
|
default = "80"
|
||||||
@ -81,6 +93,12 @@ variable "dtu_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "dtu_timeframe" {
|
||||||
|
description = "Monitor timeframe for SQL DTU [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_15m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "dtu_threshold_warning" {
|
variable "dtu_threshold_warning" {
|
||||||
description = "Amount of DTU used (warning threshold)"
|
description = "Amount of DTU used (warning threshold)"
|
||||||
default = "85"
|
default = "85"
|
||||||
@ -103,6 +121,12 @@ variable "deadlock_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "deadlock_timeframe" {
|
||||||
|
description = "Monitor timeframe for SQL Deadlock [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "deadlock_threshold_critical" {
|
variable "deadlock_threshold_critical" {
|
||||||
description = "Amount of Deadlocks (critical threshold)"
|
description = "Amount of Deadlocks (critical threshold)"
|
||||||
default = "1"
|
default = "1"
|
||||||
|
|||||||
@ -11,7 +11,7 @@ resource "datadog_monitor" "sql-database_cpu_90_15min" {
|
|||||||
message = "${coalesce(var.cpu_message, var.message)}"
|
message = "${coalesce(var.cpu_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_15m): (
|
avg(${var.cpu_timeframe}): (
|
||||||
avg:azure.sql_servers_databases.cpu_percent{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
avg:azure.sql_servers_databases.cpu_percent{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.cpu_threshold_critical}
|
) > ${var.cpu_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -44,7 +44,7 @@ resource "datadog_monitor" "sql-database_free_space_low" {
|
|||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_15m): (
|
avg(${var.diskspace_timeframe}): (
|
||||||
avg:azure.sql_servers_databases.storage_percent{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
avg:azure.sql_servers_databases.storage_percent{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.diskspace_threshold_critical}
|
) > ${var.diskspace_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -76,7 +76,7 @@ resource "datadog_monitor" "sql-database_dtu_consumption_high" {
|
|||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_15m): (
|
avg(${var.dtu_timeframe}): (
|
||||||
azure.sql_servers_databases.dtu_consumption_percent{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
azure.sql_servers_databases.dtu_consumption_percent{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.dtu_threshold_critical}
|
) > ${var.dtu_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -108,7 +108,7 @@ resource "datadog_monitor" "sql-database_deadlocks_count" {
|
|||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.deadlock_timeframe}): (
|
||||||
avg:azure.sql_servers_databases.deadlock{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
avg:azure.sql_servers_databases.deadlock{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
||||||
) > ${var.deadlock_threshold_critical}
|
) > ${var.deadlock_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
|
|||||||
@ -36,14 +36,17 @@ Inputs
|
|||||||
| authorization_error_requests_silenced | Groups to mute for Storage authorization errors monitor | map | `<map>` | no |
|
| authorization_error_requests_silenced | Groups to mute for Storage authorization errors monitor | map | `<map>` | no |
|
||||||
| authorization_error_requests_threshold_critical | Maximum acceptable percent of authorization error requests for a storage | string | `90` | no |
|
| authorization_error_requests_threshold_critical | Maximum acceptable percent of authorization error requests for a storage | string | `90` | no |
|
||||||
| authorization_error_requests_threshold_warning | Warning regarding acceptable percent of authorization error requests for a storage | string | `50` | no |
|
| authorization_error_requests_threshold_warning | Warning regarding acceptable percent of authorization error requests for a storage | string | `50` | no |
|
||||||
|
| authorization_error_requests_timeframe | Monitor timeframe for Storage authorization errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| availability_message | Custom message for Storage availability monitor | string | `` | no |
|
| availability_message | Custom message for Storage availability monitor | string | `` | no |
|
||||||
| availability_silenced | Groups to mute for Storage availability monitor | map | `<map>` | no |
|
| availability_silenced | Groups to mute for Storage availability monitor | map | `<map>` | no |
|
||||||
| availability_threshold_critical | Minimum acceptable percent of availability for a storage | string | `50` | no |
|
| availability_threshold_critical | Minimum acceptable percent of availability for a storage | string | `50` | no |
|
||||||
| availability_threshold_warning | Warning regarding acceptable percent of availability for a storage | string | `90` | no |
|
| availability_threshold_warning | Warning regarding acceptable percent of availability for a storage | string | `90` | no |
|
||||||
|
| availability_timeframe | Monitor timeframe for Storage availability [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| client_other_error_requests_message | Custom message for Storage other errors monitor | string | `` | no |
|
| client_other_error_requests_message | Custom message for Storage other errors monitor | string | `` | no |
|
||||||
| client_other_error_requests_silenced | Groups to mute for Storage other errors monitor | map | `<map>` | no |
|
| client_other_error_requests_silenced | Groups to mute for Storage other errors monitor | map | `<map>` | no |
|
||||||
| client_other_error_requests_threshold_critical | Maximum acceptable percent of client other error requests for a storage | string | `90` | no |
|
| client_other_error_requests_threshold_critical | Maximum acceptable percent of client other error requests for a storage | string | `90` | no |
|
||||||
| client_other_error_requests_threshold_warning | Warning regarding acceptable percent of client other error requests for a storage | string | `50` | no |
|
| client_other_error_requests_threshold_warning | Warning regarding acceptable percent of client other error requests for a storage | string | `50` | no |
|
||||||
|
| client_other_error_requests_timeframe | Monitor timeframe for Storage other errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
||||||
| environment | Architecture environment | string | - | yes |
|
| environment | Architecture environment | string | - | yes |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
@ -52,27 +55,33 @@ Inputs
|
|||||||
| latency_silenced | Groups to mute for Storage latency monitor | map | `<map>` | no |
|
| latency_silenced | Groups to mute for Storage latency monitor | map | `<map>` | no |
|
||||||
| latency_threshold_critical | Maximum acceptable end to end latency (ms) for a storage | string | `2000` | no |
|
| latency_threshold_critical | Maximum acceptable end to end latency (ms) for a storage | string | `2000` | no |
|
||||||
| latency_threshold_warning | Warning regarding acceptable end to end latency (ms) for a storage | string | `1000` | no |
|
| latency_threshold_warning | Warning regarding acceptable end to end latency (ms) for a storage | string | `1000` | no |
|
||||||
|
| latency_timeframe | Monitor timeframe for Storage latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| message | Message sent when a Redis monitor is triggered | string | - | yes |
|
| message | Message sent when a Redis monitor is triggered | string | - | yes |
|
||||||
| network_error_requests_message | Custom message for Storage network errors monitor | string | `` | no |
|
| network_error_requests_message | Custom message for Storage network errors monitor | string | `` | no |
|
||||||
| network_error_requests_silenced | Groups to mute for Storage network errors monitor | map | `<map>` | no |
|
| network_error_requests_silenced | Groups to mute for Storage network errors monitor | map | `<map>` | no |
|
||||||
| network_error_requests_threshold_critical | Maximum acceptable percent of network error requests for a storage | string | `90` | no |
|
| network_error_requests_threshold_critical | Maximum acceptable percent of network error requests for a storage | string | `90` | no |
|
||||||
| network_error_requests_threshold_warning | Warning regarding acceptable percent of network error requests for a storage | string | `50` | no |
|
| network_error_requests_threshold_warning | Warning regarding acceptable percent of network error requests for a storage | string | `50` | no |
|
||||||
|
| network_error_requests_timeframe | Monitor timeframe for Storage network errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| server_other_error_requests_message | Custom message for Storage server other errors monitor | string | `` | no |
|
| server_other_error_requests_message | Custom message for Storage server other errors monitor | string | `` | no |
|
||||||
| server_other_error_requests_silenced | Groups to mute for Storage server other errors monitor | map | `<map>` | no |
|
| server_other_error_requests_silenced | Groups to mute for Storage server other errors monitor | map | `<map>` | no |
|
||||||
| server_other_error_requests_threshold_critical | Maximum acceptable percent of server other error requests for a storage | string | `90` | no |
|
| server_other_error_requests_threshold_critical | Maximum acceptable percent of server other error requests for a storage | string | `90` | no |
|
||||||
| server_other_error_requests_threshold_warning | Warning regarding acceptable percent of server other error requests for a storage | string | `50` | no |
|
| server_other_error_requests_threshold_warning | Warning regarding acceptable percent of server other error requests for a storage | string | `50` | no |
|
||||||
|
| server_other_error_requests_timeframe | Monitor timeframe for Storage server other errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| successful_requests_message | Custom message for Storage sucessful requests monitor | string | `` | no |
|
| successful_requests_message | Custom message for Storage sucessful requests monitor | string | `` | no |
|
||||||
| successful_requests_silenced | Groups to mute for Storage sucessful requests monitor | map | `<map>` | no |
|
| successful_requests_silenced | Groups to mute for Storage sucessful requests monitor | map | `<map>` | no |
|
||||||
| successful_requests_threshold_critical | Minimum acceptable percent of successful requests for a storage | string | `10` | no |
|
| successful_requests_threshold_critical | Minimum acceptable percent of successful requests for a storage | string | `10` | no |
|
||||||
| successful_requests_threshold_warning | Warning regarding acceptable percent of successful requests for a storage | string | `30` | no |
|
| successful_requests_threshold_warning | Warning regarding acceptable percent of successful requests for a storage | string | `30` | no |
|
||||||
|
| successful_requests_timeframe | Monitor timeframe for Storage sucessful requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| throttling_error_requests_message | Custom message for Storage throttling error monitor | string | `` | no |
|
| throttling_error_requests_message | Custom message for Storage throttling error monitor | string | `` | no |
|
||||||
| throttling_error_requests_silenced | Groups to mute for Storage throttling error monitor | map | `<map>` | no |
|
| throttling_error_requests_silenced | Groups to mute for Storage throttling error monitor | map | `<map>` | no |
|
||||||
| throttling_error_requests_threshold_critical | Maximum acceptable percent of throttling error requests for a storage | string | `90` | no |
|
| throttling_error_requests_threshold_critical | Maximum acceptable percent of throttling error requests for a storage | string | `90` | no |
|
||||||
| throttling_error_requests_threshold_warning | Warning regarding acceptable percent of throttling error requests for a storage | string | `50` | no |
|
| throttling_error_requests_threshold_warning | Warning regarding acceptable percent of throttling error requests for a storage | string | `50` | no |
|
||||||
|
| throttling_error_requests_timeframe | Monitor timeframe for Storage throttling errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| timeout_error_requests_message | Custom message for Storage timeout monitor | string | `` | no |
|
| timeout_error_requests_message | Custom message for Storage timeout monitor | string | `` | no |
|
||||||
| timeout_error_requests_silenced | Groups to mute for Storage timeout monitor | map | `<map>` | no |
|
| timeout_error_requests_silenced | Groups to mute for Storage timeout monitor | map | `<map>` | no |
|
||||||
| timeout_error_requests_threshold_critical | Maximum acceptable percent of timeout error requests for a storage | string | `90` | no |
|
| timeout_error_requests_threshold_critical | Maximum acceptable percent of timeout error requests for a storage | string | `90` | no |
|
||||||
| timeout_error_requests_threshold_warning | Warning regarding acceptable percent of timeout error requests for a storage | string | `50` | no |
|
| timeout_error_requests_threshold_warning | Warning regarding acceptable percent of timeout error requests for a storage | string | `50` | no |
|
||||||
|
| timeout_error_requests_timeframe | Monitor timeframe for Storage timeout [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
|
|
||||||
Related documentation
|
Related documentation
|
||||||
---------------------
|
---------------------
|
||||||
|
|||||||
@ -37,6 +37,12 @@ variable "availability_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "availability_timeframe" {
|
||||||
|
description = "Monitor timeframe for Storage availability [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "availability_threshold_critical" {
|
variable "availability_threshold_critical" {
|
||||||
description = "Minimum acceptable percent of availability for a storage"
|
description = "Minimum acceptable percent of availability for a storage"
|
||||||
default = 50
|
default = 50
|
||||||
@ -59,6 +65,12 @@ variable "successful_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "successful_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for Storage sucessful requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "successful_requests_threshold_critical" {
|
variable "successful_requests_threshold_critical" {
|
||||||
description = "Minimum acceptable percent of successful requests for a storage"
|
description = "Minimum acceptable percent of successful requests for a storage"
|
||||||
default = 10
|
default = 10
|
||||||
@ -81,6 +93,12 @@ variable "latency_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "latency_timeframe" {
|
||||||
|
description = "Monitor timeframe for Storage latency [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "latency_threshold_critical" {
|
variable "latency_threshold_critical" {
|
||||||
description = "Maximum acceptable end to end latency (ms) for a storage"
|
description = "Maximum acceptable end to end latency (ms) for a storage"
|
||||||
default = 2000
|
default = 2000
|
||||||
@ -103,6 +121,12 @@ variable "timeout_error_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "timeout_error_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for Storage timeout [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "timeout_error_requests_threshold_critical" {
|
variable "timeout_error_requests_threshold_critical" {
|
||||||
description = "Maximum acceptable percent of timeout error requests for a storage"
|
description = "Maximum acceptable percent of timeout error requests for a storage"
|
||||||
default = 90
|
default = 90
|
||||||
@ -125,6 +149,12 @@ variable "network_error_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "network_error_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for Storage network errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "network_error_requests_threshold_critical" {
|
variable "network_error_requests_threshold_critical" {
|
||||||
description = "Maximum acceptable percent of network error requests for a storage"
|
description = "Maximum acceptable percent of network error requests for a storage"
|
||||||
default = 90
|
default = 90
|
||||||
@ -147,6 +177,12 @@ variable "throttling_error_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "throttling_error_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for Storage throttling errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "throttling_error_requests_threshold_critical" {
|
variable "throttling_error_requests_threshold_critical" {
|
||||||
description = "Maximum acceptable percent of throttling error requests for a storage"
|
description = "Maximum acceptable percent of throttling error requests for a storage"
|
||||||
default = 90
|
default = 90
|
||||||
@ -169,6 +205,12 @@ variable "server_other_error_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "server_other_error_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for Storage server other errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "server_other_error_requests_threshold_critical" {
|
variable "server_other_error_requests_threshold_critical" {
|
||||||
description = "Maximum acceptable percent of server other error requests for a storage"
|
description = "Maximum acceptable percent of server other error requests for a storage"
|
||||||
default = 90
|
default = 90
|
||||||
@ -191,6 +233,12 @@ variable "client_other_error_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "client_other_error_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for Storage other errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "client_other_error_requests_threshold_critical" {
|
variable "client_other_error_requests_threshold_critical" {
|
||||||
description = "Maximum acceptable percent of client other error requests for a storage"
|
description = "Maximum acceptable percent of client other error requests for a storage"
|
||||||
default = 90
|
default = 90
|
||||||
@ -213,6 +261,12 @@ variable "authorization_error_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "authorization_error_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for Storage authorization errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "authorization_error_requests_threshold_critical" {
|
variable "authorization_error_requests_threshold_critical" {
|
||||||
description = "Maximum acceptable percent of authorization error requests for a storage"
|
description = "Maximum acceptable percent of authorization error requests for a storage"
|
||||||
default = 90
|
default = 90
|
||||||
|
|||||||
@ -11,7 +11,7 @@ resource "datadog_monitor" "availability" {
|
|||||||
message = "${coalesce(var.availability_message, var.message)}"
|
message = "${coalesce(var.availability_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (default(
|
avg(${var.availability_timeframe}): (default(
|
||||||
avg:azure.storage.availability{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
avg:azure.storage.availability{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
||||||
100)) < ${var.availability_threshold_critical}
|
100)) < ${var.availability_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -42,7 +42,7 @@ resource "datadog_monitor" "successful_requests" {
|
|||||||
message = "${coalesce(var.successful_requests_message, var.message)}"
|
message = "${coalesce(var.successful_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (default(
|
avg(${var.successful_requests_timeframe}): (default(
|
||||||
avg:azure.storage.percent_success{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
avg:azure.storage.percent_success{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
||||||
100)) < ${var.successful_requests_threshold_critical}
|
100)) < ${var.successful_requests_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -73,7 +73,7 @@ resource "datadog_monitor" "latency" {
|
|||||||
message = "${coalesce(var.latency_message, var.message)}"
|
message = "${coalesce(var.latency_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (default(
|
min(${var.latency_timeframe}): (default(
|
||||||
avg:azure.storage.average_e2_e_latency{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
avg:azure.storage.average_e2_e_latency{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
||||||
0)) > ${var.latency_threshold_critical}
|
0)) > ${var.latency_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -104,7 +104,7 @@ resource "datadog_monitor" "timeout_error_requests" {
|
|||||||
message = "${coalesce(var.timeout_error_requests_message, var.message)}"
|
message = "${coalesce(var.timeout_error_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (default(
|
avg(${var.timeout_error_requests_timeframe}): (default(
|
||||||
avg:azure.storage.percent_timeout_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
avg:azure.storage.percent_timeout_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
||||||
0)) > ${var.timeout_error_requests_threshold_critical}
|
0)) > ${var.timeout_error_requests_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -135,7 +135,7 @@ resource "datadog_monitor" "network_error_requests" {
|
|||||||
message = "${coalesce(var.network_error_requests_message, var.message)}"
|
message = "${coalesce(var.network_error_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (default(
|
avg(${var.network_error_requests_timeframe}): (default(
|
||||||
avg:azure.storage.percent_network_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
avg:azure.storage.percent_network_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
||||||
0)) > ${var.network_error_requests_threshold_critical}
|
0)) > ${var.network_error_requests_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -166,7 +166,7 @@ resource "datadog_monitor" "throttling_error_requests" {
|
|||||||
message = "${coalesce(var.throttling_error_requests_message, var.message)}"
|
message = "${coalesce(var.throttling_error_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (default(
|
avg(${var.throttling_error_requests_timeframe}): (default(
|
||||||
avg:azure.storage.percent_throttling_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
avg:azure.storage.percent_throttling_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
||||||
0)) > ${var.throttling_error_requests_threshold_critical}
|
0)) > ${var.throttling_error_requests_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -197,7 +197,7 @@ resource "datadog_monitor" "server_other_error_requests" {
|
|||||||
message = "${coalesce(var.server_other_error_requests_message, var.message)}"
|
message = "${coalesce(var.server_other_error_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (default(
|
avg(${var.server_other_error_requests_timeframe}): (default(
|
||||||
avg:azure.storage.percent_server_other_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
avg:azure.storage.percent_server_other_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
||||||
0)) > ${var.server_other_error_requests_threshold_critical}
|
0)) > ${var.server_other_error_requests_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -228,7 +228,7 @@ resource "datadog_monitor" "client_other_error_requests" {
|
|||||||
message = "${coalesce(var.client_other_error_requests_message, var.message)}"
|
message = "${coalesce(var.client_other_error_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (default(
|
avg(${var.client_other_error_requests_timeframe}): (default(
|
||||||
avg:azure.storage.percent_client_other_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
avg:azure.storage.percent_client_other_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
||||||
0)) > ${var.client_other_error_requests_threshold_critical}
|
0)) > ${var.client_other_error_requests_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -259,7 +259,7 @@ resource "datadog_monitor" "authorization_error_requests" {
|
|||||||
message = "${coalesce(var.authorization_error_requests_message, var.message)}"
|
message = "${coalesce(var.authorization_error_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (default(
|
avg(${var.authorization_error_requests_timeframe}): (default(
|
||||||
avg:azure.storage.percent_authorization_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
avg:azure.storage.percent_authorization_error{${data.template_file.filter.rendered},transaction_type:all} by {resource_group,storage_type,name},
|
||||||
0)) > ${var.authorization_error_requests_threshold_critical}
|
0)) > ${var.authorization_error_requests_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
|
|||||||
@ -22,12 +22,14 @@ Inputs
|
|||||||
| conversion_errors_silenced | Groups to mute for Stream Analytics conversion errors monitor | map | `<map>` | no |
|
| conversion_errors_silenced | Groups to mute for Stream Analytics conversion errors monitor | map | `<map>` | no |
|
||||||
| conversion_errors_threshold_critical | Conversion errors limit (critical threshold) | string | `10` | no |
|
| conversion_errors_threshold_critical | Conversion errors limit (critical threshold) | string | `10` | no |
|
||||||
| conversion_errors_threshold_warning | Conversion errors limit (warning threshold) | string | `0` | no |
|
| conversion_errors_threshold_warning | Conversion errors limit (warning threshold) | string | `0` | no |
|
||||||
|
| conversion_errors_timeframe | Monitor timeframe for Stream Analytics conversion errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
| delay | Delay in seconds for the metric evaluation | string | `900` | no |
|
||||||
| environment | Architecture environment | string | - | yes |
|
| environment | Architecture environment | string | - | yes |
|
||||||
| failed_function_requests_message | Custom message for Stream Analytics failed requests monitor | string | `` | no |
|
| failed_function_requests_message | Custom message for Stream Analytics failed requests monitor | string | `` | no |
|
||||||
| failed_function_requests_silenced | Groups to mute for Stream Analytics failed requests monitor | map | `<map>` | no |
|
| failed_function_requests_silenced | Groups to mute for Stream Analytics failed requests monitor | map | `<map>` | no |
|
||||||
| failed_function_requests_threshold_critical | Failed Function Request rate limit (critical threshold) | string | `10` | no |
|
| failed_function_requests_threshold_critical | Failed Function Request rate limit (critical threshold) | string | `10` | no |
|
||||||
| failed_function_requests_threshold_warning | Failed Function Request rate limit (warning threshold) | string | `0` | no |
|
| failed_function_requests_threshold_warning | Failed Function Request rate limit (warning threshold) | string | `0` | no |
|
||||||
|
| failed_function_requests_timeframe | Monitor timeframe for Stream Analytics failed requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| message | Message sent when a Redis monitor is triggered | string | - | yes |
|
| message | Message sent when a Redis monitor is triggered | string | - | yes |
|
||||||
@ -35,12 +37,15 @@ Inputs
|
|||||||
| runtime_errors_silenced | Groups to mute for Stream Analytics runtime errors monitor | map | `<map>` | no |
|
| runtime_errors_silenced | Groups to mute for Stream Analytics runtime errors monitor | map | `<map>` | no |
|
||||||
| runtime_errors_threshold_critical | Runtime errors limit (critical threshold) | string | `10` | no |
|
| runtime_errors_threshold_critical | Runtime errors limit (critical threshold) | string | `10` | no |
|
||||||
| runtime_errors_threshold_warning | Runtime errors limit (warning threshold) | string | `0` | no |
|
| runtime_errors_threshold_warning | Runtime errors limit (warning threshold) | string | `0` | no |
|
||||||
|
| runtime_errors_timeframe | Monitor timeframe for Stream Analytics runtime errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| status_message | Custom message for Stream Analytics status monitor | string | `` | no |
|
| status_message | Custom message for Stream Analytics status monitor | string | `` | no |
|
||||||
| status_silenced | Groups to mute for Stream Analytics status monitor | map | `<map>` | no |
|
| status_silenced | Groups to mute for Stream Analytics status monitor | map | `<map>` | no |
|
||||||
|
| status_timeframe | Monitor timeframe for Stream Analytics status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| su_utilization_message | Custom message for Stream Analytics utilization monitor | string | `` | no |
|
| su_utilization_message | Custom message for Stream Analytics utilization monitor | string | `` | no |
|
||||||
| su_utilization_silenced | Groups to mute for Stream Analytics utilization monitor | map | `<map>` | no |
|
| su_utilization_silenced | Groups to mute for Stream Analytics utilization monitor | map | `<map>` | no |
|
||||||
| su_utilization_threshold_critical | Streaming Unit utilization rate limit (critical threshold) | string | `80` | no |
|
| su_utilization_threshold_critical | Streaming Unit utilization rate limit (critical threshold) | string | `80` | no |
|
||||||
| su_utilization_threshold_warning | Streaming Unit utilization rate limit (warning threshold) | string | `60` | no |
|
| su_utilization_threshold_warning | Streaming Unit utilization rate limit (warning threshold) | string | `60` | no |
|
||||||
|
| su_utilization_timeframe | Monitor timeframe for Stream Analytics utilization [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
|
|
||||||
Related documentation
|
Related documentation
|
||||||
---------------------
|
---------------------
|
||||||
|
|||||||
@ -37,6 +37,12 @@ variable "status_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "status_timeframe" {
|
||||||
|
description = "Monitor timeframe for Stream Analytics status [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "su_utilization_silenced" {
|
variable "su_utilization_silenced" {
|
||||||
description = "Groups to mute for Stream Analytics utilization monitor"
|
description = "Groups to mute for Stream Analytics utilization monitor"
|
||||||
type = "map"
|
type = "map"
|
||||||
@ -49,6 +55,12 @@ variable "su_utilization_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "su_utilization_timeframe" {
|
||||||
|
description = "Monitor timeframe for Stream Analytics utilization [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "su_utilization_threshold_warning" {
|
variable "su_utilization_threshold_warning" {
|
||||||
description = "Streaming Unit utilization rate limit (warning threshold)"
|
description = "Streaming Unit utilization rate limit (warning threshold)"
|
||||||
default = 60
|
default = 60
|
||||||
@ -71,6 +83,12 @@ variable "failed_function_requests_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "failed_function_requests_timeframe" {
|
||||||
|
description = "Monitor timeframe for Stream Analytics failed requests [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "failed_function_requests_threshold_warning" {
|
variable "failed_function_requests_threshold_warning" {
|
||||||
description = "Failed Function Request rate limit (warning threshold)"
|
description = "Failed Function Request rate limit (warning threshold)"
|
||||||
default = 0
|
default = 0
|
||||||
@ -93,6 +111,12 @@ variable "conversion_errors_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "conversion_errors_timeframe" {
|
||||||
|
description = "Monitor timeframe for Stream Analytics conversion errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "conversion_errors_threshold_warning" {
|
variable "conversion_errors_threshold_warning" {
|
||||||
description = "Conversion errors limit (warning threshold)"
|
description = "Conversion errors limit (warning threshold)"
|
||||||
default = 0
|
default = 0
|
||||||
@ -115,6 +139,12 @@ variable "runtime_errors_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "runtime_errors_timeframe" {
|
||||||
|
description = "Monitor timeframe for Stream Analytics runtime errors [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "runtime_errors_threshold_warning" {
|
variable "runtime_errors_threshold_warning" {
|
||||||
description = "Runtime errors limit (warning threshold)"
|
description = "Runtime errors limit (warning threshold)"
|
||||||
default = 0
|
default = 0
|
||||||
|
|||||||
@ -11,7 +11,7 @@ resource "datadog_monitor" "status" {
|
|||||||
message = "${coalesce(var.status_message, var.message)}"
|
message = "${coalesce(var.status_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m):avg:azure.streamanalytics_streamingjobs.status{${data.template_file.filter.rendered}} by {resource_group,region,name} < 1
|
avg(${var.status_timeframe}):avg:azure.streamanalytics_streamingjobs.status{${data.template_file.filter.rendered}} by {resource_group,region,name} < 1
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
@ -36,7 +36,7 @@ resource "datadog_monitor" "su_utilization" {
|
|||||||
message = "${coalesce(var.su_utilization_message, var.message)}"
|
message = "${coalesce(var.su_utilization_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (
|
avg(${var.su_utilization_timeframe}): (
|
||||||
avg:azure.streamanalytics_streamingjobs.resource_utilization{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
avg:azure.streamanalytics_streamingjobs.resource_utilization{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.su_utilization_threshold_critical}
|
) > ${var.su_utilization_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -68,7 +68,7 @@ resource "datadog_monitor" "failed_function_requests" {
|
|||||||
message = "${coalesce(var.failed_function_requests_message, var.message)}"
|
message = "${coalesce(var.failed_function_requests_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
sum(last_5m): (
|
sum(${var.failed_function_requests_timeframe}): (
|
||||||
avg:azure.streamanalytics_streamingjobs.aml_callout_failed_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
avg:azure.streamanalytics_streamingjobs.aml_callout_failed_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count() /
|
||||||
avg:azure.streamanalytics_streamingjobs.aml_callout_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
avg:azure.streamanalytics_streamingjobs.aml_callout_requests{${data.template_file.filter.rendered}} by {resource_group,region,name}.as_count()
|
||||||
) * 100 > ${var.failed_function_requests_threshold_critical}
|
) * 100 > ${var.failed_function_requests_threshold_critical}
|
||||||
@ -101,7 +101,7 @@ resource "datadog_monitor" "conversion_errors" {
|
|||||||
message = "${coalesce(var.conversion_errors_message, var.message)}"
|
message = "${coalesce(var.conversion_errors_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (
|
avg(${var.conversion_errors_timeframe}): (
|
||||||
avg:azure.streamanalytics_streamingjobs.conversion_errors{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
avg:azure.streamanalytics_streamingjobs.conversion_errors{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.conversion_errors_threshold_critical}
|
) > ${var.conversion_errors_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
@ -133,7 +133,7 @@ resource "datadog_monitor" "runtime_errors" {
|
|||||||
message = "${coalesce(var.runtime_errors_message, var.message)}"
|
message = "${coalesce(var.runtime_errors_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (
|
avg(${var.runtime_errors_timeframe}): (
|
||||||
avg:azure.streamanalytics_streamingjobs.errors{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
avg:azure.streamanalytics_streamingjobs.errors{${data.template_file.filter.rendered}} by {resource_group,region,name}
|
||||||
) > ${var.runtime_errors_threshold_critical}
|
) > ${var.runtime_errors_threshold_critical}
|
||||||
EOF
|
EOF
|
||||||
|
|||||||
@ -74,10 +74,11 @@ Inputs
|
|||||||
|
|
||||||
| Name | Description | Type | Default | Required |
|
| Name | Description | Type | Default | Required |
|
||||||
|------|-------------|:----:|:-----:|:-----:|
|
|------|-------------|:----:|:-----:|:-----:|
|
||||||
| environment | Architecture Environment | string | - | yes |
|
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `15` | no |
|
| delay | Delay in seconds for the metric evaluation | string | `15` | no |
|
||||||
|
| environment | Architecture Environment | string | - | yes |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| message | Message sent when an alert is triggered | string | - | yes |
|
| message | Message sent when an alert is triggered | string | - | yes |
|
||||||
| mongodb_replicaset_message | Custom message for Mongodb replicaset monitor | string | `` | no |
|
| mongodb_replicaset_message | Custom message for Mongodb replicaset monitor | string | `` | no |
|
||||||
| mongodb_replicaset_silenced | Groups to mute for Mongodb replicaset monitor | map | `<map>` | no |
|
| mongodb_replicaset_silenced | Groups to mute for Mongodb replicaset monitor | map | `<map>` | no |
|
||||||
|
| mongodb_replicaset_timeframe | Monitor timeframe for Mongodb replicaset [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
|
|||||||
@ -35,3 +35,9 @@ variable "mongodb_replicaset_message" {
|
|||||||
type = "string"
|
type = "string"
|
||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "mongodb_replicaset_timeframe" {
|
||||||
|
description = "Monitor timeframe for Mongodb replicaset [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|||||||
@ -11,7 +11,7 @@ resource "datadog_monitor" "mongodb_replicaset_state" {
|
|||||||
message = "${coalesce(var.mongodb_replicaset_message, var.message)}"
|
message = "${coalesce(var.mongodb_replicaset_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_5m): (
|
avg(${var.mongodb_replicaset_timeframe}): (
|
||||||
avg:mongodb.replset.health{${data.template_file.filter.rendered}} by {region,replset_name}
|
avg:mongodb.replset.health{${data.template_file.filter.rendered}} by {region,replset_name}
|
||||||
) < 1
|
) < 1
|
||||||
EOF
|
EOF
|
||||||
|
|||||||
@ -25,8 +25,8 @@ Inputs
|
|||||||
|
|
||||||
| Name | Description | Type | Default | Required |
|
| Name | Description | Type | Default | Required |
|
||||||
|------|-------------|:----:|:-----:|:-----:|
|
|------|-------------|:----:|:-----:|:-----:|
|
||||||
| environment | Architecture Environment | string | - | yes |
|
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `15` | no |
|
| delay | Delay in seconds for the metric evaluation | string | `15` | no |
|
||||||
|
| environment | Architecture Environment | string | - | yes |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| message | Message sent when an alert is triggered | string | - | yes |
|
| message | Message sent when an alert is triggered | string | - | yes |
|
||||||
@ -34,5 +34,6 @@ Inputs
|
|||||||
| php_fpm_busy_silenced | Groups to mute for PHP FPM busy worker monitor | map | `<map>` | no |
|
| php_fpm_busy_silenced | Groups to mute for PHP FPM busy worker monitor | map | `<map>` | no |
|
||||||
| php_fpm_busy_threshold_critical | php fpm busy critical threshold | string | `0.9` | no |
|
| php_fpm_busy_threshold_critical | php fpm busy critical threshold | string | `0.9` | no |
|
||||||
| php_fpm_busy_threshold_warning | php fpm busy warning threshold | string | `0.8` | no |
|
| php_fpm_busy_threshold_warning | php fpm busy warning threshold | string | `0.8` | no |
|
||||||
|
| php_fpm_busy_timeframe | Monitor timeframe for PHP FPM busy worker [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_10m` | no |
|
||||||
| php_fpm_connect_message | Custom message for PHP FPM process monitor | string | `` | no |
|
| php_fpm_connect_message | Custom message for PHP FPM process monitor | string | `` | no |
|
||||||
| php_fpm_connect_silenced | Groups to mute for PHP FPM process monitor | map | `<map>` | no |
|
| php_fpm_connect_silenced | Groups to mute for PHP FPM process monitor | map | `<map>` | no |
|
||||||
|
|||||||
@ -38,6 +38,12 @@ variable "php_fpm_busy_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "php_fpm_busy_timeframe" {
|
||||||
|
description = "Monitor timeframe for PHP FPM busy worker [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_10m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "php_fpm_busy_threshold_warning" {
|
variable "php_fpm_busy_threshold_warning" {
|
||||||
description = "php fpm busy warning threshold"
|
description = "php fpm busy warning threshold"
|
||||||
default = 0.8
|
default = 0.8
|
||||||
|
|||||||
@ -13,7 +13,7 @@ resource "datadog_monitor" "datadog_php_fpm_connect_idle" {
|
|||||||
type = "metric alert"
|
type = "metric alert"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
avg(last_10m): (
|
avg(${var.php_fpm_busy_timeframe}): (
|
||||||
avg:php_fpm.processes.active{${data.template_file.filter.rendered}} by {region, host} /
|
avg:php_fpm.processes.active{${data.template_file.filter.rendered}} by {region, host} /
|
||||||
( avg:php_fpm.processes.idle{${data.template_file.filter.rendered}} by {region, host} +
|
( avg:php_fpm.processes.idle{${data.template_file.filter.rendered}} by {region, host} +
|
||||||
avg:php_fpm.processes.active{${data.template_file.filter.rendered}} by {region, host} )
|
avg:php_fpm.processes.active{${data.template_file.filter.rendered}} by {region, host} )
|
||||||
|
|||||||
@ -32,26 +32,29 @@ Inputs
|
|||||||
| cpu_high_silenced | Groups to mute for CPU high monitor | map | `<map>` | no |
|
| cpu_high_silenced | Groups to mute for CPU high monitor | map | `<map>` | no |
|
||||||
| cpu_high_threshold_critical | CPU high critical threshold | string | `95` | no |
|
| cpu_high_threshold_critical | CPU high critical threshold | string | `95` | no |
|
||||||
| cpu_high_threshold_warning | CPU high warning threshold | string | `80` | no |
|
| cpu_high_threshold_warning | CPU high warning threshold | string | `80` | no |
|
||||||
| cpu_high_timeframe | CPU high timeframe | string | `last_5m` | no |
|
| cpu_high_timeframe | Monitor timeframe for CPU high [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| cpu_load_message | Custom message for CPU load ratio monitor | string | `` | no |
|
| cpu_load_message | Custom message for CPU load ratio monitor | string | `` | no |
|
||||||
| cpu_load_silenced | Groups to mute for CPU load ratio monitor | map | `<map>` | no |
|
| cpu_load_silenced | Groups to mute for CPU load ratio monitor | map | `<map>` | no |
|
||||||
| cpu_load_threshold_critical | CPU load ratio critical threshold | string | `4` | no |
|
| cpu_load_threshold_critical | CPU load ratio critical threshold | string | `4` | no |
|
||||||
| cpu_load_threshold_warning | CPU load ratio warning threshold | string | `3` | no |
|
| cpu_load_threshold_warning | CPU load ratio warning threshold | string | `3` | no |
|
||||||
| cpu_load_timeframe | CPU load timeframe | string | `last_5m` | no |
|
| cpu_load_timeframe | Monitor timeframe for CPU load ratio [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| environment | Architecture Environment | string | - | yes |
|
|
||||||
| delay | Delay in seconds for the metric evaluation | string | `15` | no |
|
| delay | Delay in seconds for the metric evaluation | string | `15` | no |
|
||||||
|
| environment | Architecture Environment | string | - | yes |
|
||||||
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
|
||||||
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
|
||||||
| free_disk_inodes_message | Custom message for Free disk inodes monitor | string | `` | no |
|
| free_disk_inodes_message | Custom message for Free disk inodes monitor | string | `` | no |
|
||||||
| free_disk_inodes_silenced | Groups to mute for Free disk inodes monitor | map | `<map>` | no |
|
| free_disk_inodes_silenced | Groups to mute for Free disk inodes monitor | map | `<map>` | no |
|
||||||
| free_disk_inodes_threshold_critical | Free disk space critical threshold | string | `5` | no |
|
| free_disk_inodes_threshold_critical | Free disk space critical threshold | string | `5` | no |
|
||||||
| free_disk_inodes_threshold_warning | Free disk space warning threshold | string | `10` | no |
|
| free_disk_inodes_threshold_warning | Free disk space warning threshold | string | `10` | no |
|
||||||
|
| free_disk_inodes_timeframe | Monitor timeframe for Free disk inodes [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| free_disk_space_message | Custom message for Free diskspace monitor | string | `` | no |
|
| free_disk_space_message | Custom message for Free diskspace monitor | string | `` | no |
|
||||||
| free_disk_space_silenced | Groups to mute for Free diskspace monitor | map | `<map>` | no |
|
| free_disk_space_silenced | Groups to mute for Free diskspace monitor | map | `<map>` | no |
|
||||||
| free_disk_space_threshold_critical | Free disk space critical threshold | string | `5` | no |
|
| free_disk_space_threshold_critical | Free disk space critical threshold | string | `5` | no |
|
||||||
| free_disk_space_threshold_warning | Free disk space warning threshold | string | `10` | no |
|
| free_disk_space_threshold_warning | Free disk space warning threshold | string | `10` | no |
|
||||||
|
| free_disk_space_timeframe | Monitor timeframe for Free diskspace [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
|
||||||
| free_memory_message | Custom message for Free memory monitor | string | - | yes |
|
| free_memory_message | Custom message for Free memory monitor | string | - | yes |
|
||||||
| free_memory_silenced | Groups to mute for Free memory monitor | map | `<map>` | no |
|
| free_memory_silenced | Groups to mute for Free memory monitor | map | `<map>` | no |
|
||||||
| free_memory_threshold_critical | Free disk space critical threshold | string | `5` | no |
|
| free_memory_threshold_critical | Free disk space critical threshold | string | `5` | no |
|
||||||
| free_memory_threshold_warning | Free disk space warning threshold | string | `10` | no |
|
| free_memory_threshold_warning | Free disk space warning threshold | string | `10` | no |
|
||||||
|
| free_memory_timeframe | Monitor timeframe for Free memory [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_1m` | no |
|
||||||
| message | Message sent when an alert is triggered | string | - | yes |
|
| message | Message sent when an alert is triggered | string | - | yes |
|
||||||
|
|||||||
@ -39,7 +39,8 @@ variable "cpu_high_message" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
variable "cpu_high_timeframe" {
|
variable "cpu_high_timeframe" {
|
||||||
description = "CPU high timeframe"
|
description = "Monitor timeframe for CPU high [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
default = "last_5m"
|
default = "last_5m"
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -66,7 +67,8 @@ variable "cpu_load_message" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
variable "cpu_load_timeframe" {
|
variable "cpu_load_timeframe" {
|
||||||
description = "CPU load ratio timeframe"
|
description = "Monitor timeframe for CPU load ratio [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
default = "last_5m"
|
default = "last_5m"
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -92,6 +94,12 @@ variable "free_disk_space_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "free_disk_space_timeframe" {
|
||||||
|
description = "Monitor timeframe for Free diskspace [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "free_disk_space_threshold_warning" {
|
variable "free_disk_space_threshold_warning" {
|
||||||
description = "Free disk space warning threshold"
|
description = "Free disk space warning threshold"
|
||||||
default = 10
|
default = 10
|
||||||
@ -114,6 +122,12 @@ variable "free_disk_inodes_message" {
|
|||||||
default = ""
|
default = ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "free_disk_inodes_timeframe" {
|
||||||
|
description = "Monitor timeframe for Free disk inodes [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_5m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "free_disk_inodes_threshold_warning" {
|
variable "free_disk_inodes_threshold_warning" {
|
||||||
description = "Free disk space warning threshold"
|
description = "Free disk space warning threshold"
|
||||||
default = 10
|
default = 10
|
||||||
@ -135,6 +149,12 @@ variable "free_memory_message" {
|
|||||||
type = "string"
|
type = "string"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "free_memory_timeframe" {
|
||||||
|
description = "Monitor timeframe for Free memory [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
|
||||||
|
type = "string"
|
||||||
|
default = "last_1m"
|
||||||
|
}
|
||||||
|
|
||||||
variable "free_memory_threshold_warning" {
|
variable "free_memory_threshold_warning" {
|
||||||
description = "Free disk space warning threshold"
|
description = "Free disk space warning threshold"
|
||||||
default = 10
|
default = 10
|
||||||
|
|||||||
@ -74,7 +74,7 @@ resource "datadog_monitor" "datadog_free_disk_space_too_low" {
|
|||||||
message = "${coalesce(var.free_disk_space_message, var.message)}"
|
message = "${coalesce(var.free_disk_space_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.free_disk_space_timeframe}): (
|
||||||
avg:system.disk.free{${data.template_file.filter.rendered},dd_disk:enabled} by {region,host,device} /
|
avg:system.disk.free{${data.template_file.filter.rendered},dd_disk:enabled} by {region,host,device} /
|
||||||
avg:system.disk.total{${data.template_file.filter.rendered},dd_disk:enabled} by {region,host,device} * 100
|
avg:system.disk.total{${data.template_file.filter.rendered},dd_disk:enabled} by {region,host,device} * 100
|
||||||
) < ${var.free_disk_space_threshold_critical}
|
) < ${var.free_disk_space_threshold_critical}
|
||||||
@ -106,7 +106,7 @@ resource "datadog_monitor" "datadog_free_disk_space_inodes_too_low" {
|
|||||||
message = "${coalesce(var.free_disk_inodes_message, var.message)}"
|
message = "${coalesce(var.free_disk_inodes_message, var.message)}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_5m): (
|
min(${var.free_disk_inodes_timeframe}): (
|
||||||
avg:system.fs.inodes.free{${data.template_file.filter.rendered},dd_disk:enabled} by {region,host,device} /
|
avg:system.fs.inodes.free{${data.template_file.filter.rendered},dd_disk:enabled} by {region,host,device} /
|
||||||
avg:system.fs.inodes.total{${data.template_file.filter.rendered},dd_disk:enabled} by {region,host,device} * 100
|
avg:system.fs.inodes.total{${data.template_file.filter.rendered},dd_disk:enabled} by {region,host,device} * 100
|
||||||
) < ${var.free_disk_inodes_threshold_critical}
|
) < ${var.free_disk_inodes_threshold_critical}
|
||||||
@ -138,7 +138,7 @@ resource "datadog_monitor" "datadog_free_memory" {
|
|||||||
message = "${var.free_memory_message}"
|
message = "${var.free_memory_message}"
|
||||||
|
|
||||||
query = <<EOF
|
query = <<EOF
|
||||||
min(last_1m): (
|
min(${var.free_memory_timeframe}): (
|
||||||
avg:system.mem.free{${data.template_file.filter.rendered}} by {region,host} /
|
avg:system.mem.free{${data.template_file.filter.rendered}} by {region,host} /
|
||||||
avg:system.mem.total{${data.template_file.filter.rendered}} by {region,host} * 100
|
avg:system.mem.total{${data.template_file.filter.rendered}} by {region,host} * 100
|
||||||
) < ${var.free_memory_threshold_critical}
|
) < ${var.free_memory_threshold_critical}
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user