CAAS KUBERNETES VELERO DataDog monitors

How to use this module

module "datadog-monitors-caas-kubernetes-velero" {
  source = "claranet/monitors/datadog//caas/kubernetes/velero"
  version = "{revision}"

  environment = var.environment
  message     = module.datadog-message-alerting.alerting-message
}

Purpose

Creates DataDog monitors with the following checks:

Velero backup deletion failure
Velero backup failure
Velero backup partial failure
Velero scheduled backup missing
Velero volume snapshot failure

Inputs

Name	Description	Type	Default	Required
environment	Architecture environment	string	n/a	yes
evaluation_delay	Delay in seconds for the metric evaluation	string	`"15"`	no
filter_tags_custom	Tags used for custom filtering when filter_tags_use_defaults is false	string	`"*"`	no
filter_tags_custom_excluded	Tags excluded for custom filtering when filter_tags_use_defaults is false	string	`""`	no
filter_tags_use_defaults	Use default filter tags convention	string	`"true"`	no
message	Message sent when a monitor is triggered	string	n/a	yes
new_host_delay	Delay in seconds before monitor new resource	string	`"300"`	no
notify_no_data	Will raise no data alert if set to true	string	`"true"`	no
prefix_slug	Prefix string to prepend between brackets on every monitors names	string	`""`	no
velero_backup_deletion_failure_enabled	Flag to enable Velero backup deletion failure monitor	string	`"true"`	no
velero_backup_deletion_failure_extra_tags	Extra tags for Velero backup deletion failure monitor	list(string)	`[]`	no
velero_backup_deletion_failure_monitor_message	Custom message for Velero backup deletion failure monitor	string	`""`	no
velero_backup_deletion_failure_monitor_timeframe	Monitor timeframe for Velero backup deletion failure monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]	string	`"last_1d"`	no
velero_backup_failure_enabled	Flag to enable Velero backup failure monitor	string	`"true"`	no
velero_backup_failure_extra_tags	Extra tags for Velero backup failure monitor	list(string)	`[]`	no
velero_backup_failure_monitor_message	Custom message for Velero backup failure monitor	string	`""`	no
velero_backup_failure_monitor_timeframe	Monitor timeframe for Velero backup failure monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]	string	`"last_1d"`	no
velero_backup_partial_failure_enabled	Flag to enable Velero backup partial failure monitor	string	`"true"`	no
velero_backup_partial_failure_extra_tags	Extra tags for Velero backup partial failure monitor	list(string)	`[]`	no
velero_backup_partial_failure_monitor_message	Custom message for Velero backup partial failure monitor	string	`""`	no
velero_backup_partial_failure_monitor_timeframe	Monitor timeframe for Velero backup partial failure monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]	string	`"last_1d"`	no
velero_scheduled_backup_missing_enabled	Flag to enable Velero scheduled backup missing monitor	string	`"true"`	no
velero_scheduled_backup_missing_extra_tags	Extra tags for Velero scheduled backup missing monitor	list(string)	`[]`	no
velero_scheduled_backup_missing_monitor_message	Custom message for Velero scheduled backup missing monitor	string	`""`	no
velero_scheduled_backup_missing_monitor_no_data_timeframe	No data timeframe in minutes	string	`"1440"`	no
velero_scheduled_backup_missing_monitor_timeframe	Monitor timeframe for Velero scheduled backup missing monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]	string	`"last_1d"`	no
velero_volume_snapshot_failure_enabled	Flag to enable Velero volume snapshot failure monitor	string	`"true"`	no
velero_volume_snapshot_failure_extra_tags	Extra tags for Velero volume snapshot failure monitor	list(string)	`[]`	no
velero_volume_snapshot_failure_monitor_message	Custom message for Velero volume snapshot failure monitor	string	`""`	no
velero_volume_snapshot_failure_monitor_timeframe	Monitor timeframe for Velero volume snapshot failure monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]	string	`"last_1d"`	no

Outputs

Name	Description
velero_backup_deletion_failure_id	id for monitor velero_backup_deletion_failure
velero_backup_failure_id	id for monitor velero_backup_failure
velero_backup_partial_failure_id	id for monitor velero_backup_partial_failure
velero_scheduled_backup_missing_id	id for monitor velero_scheduled_backup_missing
velero_volume_snapshot_failure_id	id for monitor velero_volume_snapshot_failure

Documentation for Datadog prometheus intergration: https://docs.datadoghq.com/integrations/prometheus/ Documentation for Datadog OpenMetrics integration: https://docs.datadoghq.com/integrations/openmetrics/ Documentation for Datadog autodiscovery: https://docs.datadoghq.com/agent/autodiscovery/clusterchecks/

How to configure Datadog agent for these monitors ?

You can configure Datadog agent by autodiscovery pod annotations or by configuration file.

Configuration by autodiscovery pod annotations

Add these annotations to Velero pods:

podAnnotations: {
  "ad.datadoghq.com/velero.check_names": '["openmetrics"]',
  "ad.datadoghq.com/velero.init_configs": '[{}]',
  "ad.datadoghq.com/velero.instances": '[{"prometheus_url": "http://%%host%%:8085/metrics", "namespace": "velero", "metrics": ["velero*"]}]'
}

Configuration by configuration file

Example of openmetrics.d/conf.yaml:

init_config:

instances:

    ## @param prometheus_url - string - required
    ## The URL where your application metrics are exposed by Prometheus.
    #
  - prometheus_url: http://velero.velero.svc.cluster.local:8085/metrics

    ## @param namespace - string - required
    ## The namespace to be prepended to all metrics.
    #
    namespace: "velero"

    ## @param metrics - list of strings - required
    ## List of metrics to be fetched from the prometheus endpoint, if there's a
    ## value it'll be renamed. This list should contain at least one metric
    #
    metrics:
      - velero*

How to monitor multiple schedule witch have different frequencies ?

If you have multiple Velero schedules with different frequencies, you must duplicate this module and disable common monitors in the others instance of module.

module "datadog-monitors-caas-kubernetes-velero" {
  source = "claranet/monitors/datadog//caas/kubernetes/velero"
  version = "{revision}"

  environment = var.environment
  message     = module.datadog-message-alerting.alerting-message

  velero_backup_failure_enabled          = false
  velero_backup_partial_failure_enabled  = false
  velero_backup_deletion_failure_enabled = false
  velero_volume_snapshot_failure_enabled = false
}

7.3 KiB Raw Blame History