2019-02-26 18:19:28 +01:00

4.9 KiB

CAAS KUBERNETES INGRESS VTS DataDog monitors

How to use this module

module "datadog-monitors-caas-kubernetes-ingress-vts" {
  source = "git::ssh://git@git.fr.clara.net/claranet/cloudnative/projects/datadog/terraform/monitors.git//caas/kubernetes/ingress/vts?ref={revision}"

  environment = "${var.environment}"
  message     = "${module.datadog-message-alerting.alerting-message}"
}

Purpose

Creates DataDog monitors with the following checks:

  • Nginx Ingress 4xx errors
  • Nginx Ingress 5xx errors

Inputs

Name Description Type Default Required
artificial_requests_count Number of false requests used to mitigate false positive in case of low trafic string "5" no
environment Architecture Environment string n/a yes
evaluation_delay Delay in seconds for the metric evaluation string "15" no
filter_tags_custom Tags used for custom filtering when filter_tags_use_defaults is false string "*" no
filter_tags_custom_excluded Tags excluded for custom filtering when filter_tags_use_defaults is false string "" no
filter_tags_use_defaults Use default filter tags convention string "true" no
ingress_4xx_enabled Flag to enable Ingress 4xx errors monitor string "true" no
ingress_4xx_extra_tags Extra tags for Ingress 4xx errors monitor list [] no
ingress_4xx_message Message sent when an alert is triggered string "" no
ingress_4xx_silenced Groups to mute for Ingress 4xx errors monitor map {} no
ingress_4xx_threshold_critical 4xx critical threshold in percentage string "40" no
ingress_4xx_threshold_warning 4xx warning threshold in percentage string "20" no
ingress_4xx_time_aggregator Monitor aggregator for Ingress 4xx errors [available values: min, max or avg] string "min" no
ingress_4xx_timeframe Monitor timeframe for Ingress 4xx errors [available values: last_#m (1, 5, 10, 15, or 30), last_#h (1, 2, or 4), or last_1d] string "last_5m" no
ingress_5xx_enabled Flag to enable Ingress 5xx errors monitor string "true" no
ingress_5xx_extra_tags Extra tags for Ingress 5xx errors monitor list [] no
ingress_5xx_message Message sent when an alert is triggered string "" no
ingress_5xx_silenced Groups to mute for Ingress 5xx errors monitor map {} no
ingress_5xx_threshold_critical 5xx critical threshold in percentage string "20" no
ingress_5xx_threshold_warning 5xx warning threshold in percentage string "10" no
ingress_5xx_time_aggregator Monitor aggregator for Ingress 5xx errors [available values: min, max or avg] string "min" no
ingress_5xx_timeframe Monitor timeframe for Ingress 5xx errors [available values: last_#m (1, 5, 10, 15, or 30), last_#h (1, 2, or 4), or last_1d] string "last_5m" no
message Message sent when an alert is triggered string n/a yes
new_host_delay Delay in seconds before monitor new resource string "300" no

Outputs

Name Description
nginx_ingress_too_many_4xx_id id for monitor nginx_ingress_too_many_4xx
nginx_ingress_too_many_5xx_id id for monitor nginx_ingress_too_many_5xx

DataDog blog: https://www.datadoghq.com/blog/monitor-prometheus-metrics 1d38e3a384

Nginx Ingress Controller setup

This configuration and monitors only work for ingress controller version :

  • >= 0.10 because ingress is beta before that and metrics naming convention not stable
  • <= 0.15 because ingress does not use VTS metrics since 0.16 Enable the following flags in the Nginx Ingress Controller chart controller.stats.enabled=true,controller.metrics.enabled=true and the following Datadog agent configuration for each ingress controller:
datadog:
  confd:
    prometheus.yaml: |-
      #nginx_upstream_responses_total{ingress_class,namespace,server,status_code:{1xx,2xx,3xx,4xx,5xx},upstream}
      #nginx_upstream_requests_total{ingress_class,namespace,server,upstream}
      init_config:
      instances:
        # The prometheus endpoint to query from
        - prometheus_url: http://nginx-ingress-controller-metrics:9913/metrics
          # This is NOT the ingress namespace, it is the prefix that will be used for the custom metrics
          namespace: nginx-ingress
          # Filter on the following metrics only
          metrics:
            - "nginx_upstream_requests_total"
            - "nginx_upstream_responses_total"
          # Adapt the tags to the current convention and verify that the monitor will match
          tags:
              - dd_monitoring:enabled
              - dd_ingress:enabled
              - dd_ingress_class:nginx
              - env:ENV