Merged in MON-199-mongodb-add-monitors-for-primary (pull request #93)

MON-199 Advanced monitors for Mongodb

Approved-by: Christophe GENINET <christophe.geninet@fr.clara.net>
Approved-by: Boris Rousseau <boris.rousseau@morea.fr>
Approved-by: Alexandre Gaillet <alexandre.gaillet@fr.clara.net>
Approved-by: Laurent Piroelle <laurent.piroelle@fr.clara.net>
Approved-by: Jérôme Respaut <shr3ps@gmail.com>
This commit is contained in:
Christophe GENINET 2018-08-09 15:29:07 +00:00 committed by Quentin Manfroi
commit acd3d43923
4 changed files with 242 additions and 72 deletions

View File

@ -16,7 +16,10 @@ module "datadog-monitors-databases-mongodb" {
Creates DataDog monitors with the following checks: Creates DataDog monitors with the following checks:
- Member down in the replica set - MongoDB primary state
- MongoDB secondary missing
- MongoDB too much servers or wrong monitoring config
- MongoDB replication lag
## Inputs ## Inputs
@ -27,63 +30,36 @@ Creates DataDog monitors with the following checks:
| filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no | | filter_tags_custom | Tags used for custom filtering when filter_tags_use_defaults is false | string | `*` | no |
| filter_tags_use_defaults | Use default filter tags convention | string | `true` | no | | filter_tags_use_defaults | Use default filter tags convention | string | `true` | no |
| message | Message sent when an alert is triggered | string | - | yes | | message | Message sent when an alert is triggered | string | - | yes |
| mongodb_replicaset_message | Custom message for Mongodb replicaset monitor | string | `` | no | | mongodb_desired_servers_count | Number of servers that should be instanciated for this cluster | string | `3` | no |
| mongodb_replicaset_silenced | Groups to mute for Mongodb replicaset monitor | map | `<map>` | no | | mongodb_lag_critical | Critical replication lag in s | string | `5` | no |
| mongodb_replicaset_time_aggregator | Monitor aggregator for Mongodb replicaset [available values: min, max or avg] | string | `max` | no | | mongodb_lag_warning | Warn replication lag in s | string | `2` | no |
| mongodb_replicaset_timeframe | Monitor timeframe for Mongodb replicaset [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no | | mongodb_primary_aggregator | Monitor aggregator for MongoDB primary state [available values: min, max] | string | `max` | no |
| mongodb_primary_message | Custom message for MongoDB primary monitor | string | `` | no |
| mongodb_primary_silenced | Groups to mute for MongoDB primary state monitor | map | `<map>` | no |
| mongodb_primary_timeframe | Monitor timeframe for MongoDB wrong state for primary node [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_1m` | no |
| mongodb_replication_aggregator | Monitor aggregator for MongoDB replication lag [available values: min, max, sum or avg] | string | `avg` | no |
| mongodb_replication_message | Custom message for MongoDB replication monitor | string | `` | no |
| mongodb_replication_silenced | Groups to mute for MongoDB replication lag monitor | map | `<map>` | no |
| mongodb_replication_timeframe | Monitor timeframe for MongoDB replication lag [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_1m` | no |
| mongodb_secondary_aggregator | Monitor aggregator for MongoDB secondary state [available values: min, max] | string | `max` | no |
| mongodb_secondary_message | Custom message for MongoDB secondary monitor | string | `` | no |
| mongodb_secondary_silenced | Groups to mute for MongoDB secondary state monitor | map | `<map>` | no |
| mongodb_secondary_timeframe | Monitor timeframe for MongoDB wrong state for secondaries nodes [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_5m` | no |
| mongodb_server_count_aggregator | Monitor aggregator for MongoDB server count [available values: min, max] | string | `min` | no |
| mongodb_server_count_message | Custom message for MongoDB server count | string | `` | no |
| mongodb_server_count_silenced | Groups to mute for MongoDB server count monitor | map | `<map>` | no |
| mongodb_server_count_timeframe | Monitor timeframe for MongoDB wrong server count [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`] | string | `last_15m` | no |
## Outputs ## Outputs
| Name | Description | | Name | Description |
|------|-------------| |------|-------------|
| mongodb_replicaset_state_id | id for monitor mongodb_replicaset_state | | mongodb_primary_id | id for monitor mongodb_primary |
| mongodb_replication_id | id for monitor mongodb_replication |
| mongodb_secondary_id | id for monitor mongodb_secondary |
| mongodb_server_count_id | id for monitor mongodb_server_count |
## Related documentation ## Related documentation
DataDog documentation: [https://docs.datadoghq.com/integrations/mongo/](https://docs.datadoghq.com/integrations/mongo/) DataDog documentation: [https://docs.datadoghq.com/integrations/mongo/](https://docs.datadoghq.com/integrations/mongo/)
MongoDB documentation: [https://docs.mongodb.com/manual/administration/monitoring/](https://docs.mongodb.com/manual/administration/monitoring/)
## Custom settings
### Prepare your ReplicaSet
Add a user to your ReplicaSet (on the primary instance)
```
use admin
db.auth("admin", "admin-password") ## This is optional is you don't have any admin password
db.createUser({"user":"datadog", "pwd": "{{PASSWORD}}", "roles" : [ {role: 'read', db: 'admin' }, {role: 'clusterMonitor', db: 'admin'}, {role: 'read', db: 'local' }]})
```
### Configure your Datadog agent
Add this file conf.d/mongo.yaml
```
init_config:
instances:
- server: mongodb://datadog:password@[MONGO_URI]
tags:
- mytag1
- mytag2
- server: mongodb://datadog:password@[MONGO_URI]
tags:
- mytag1
- mytag2
```
### Monitor ReplicaSet Health
Name: [environment] Replica Set heath for {{ replset_name }}
This monitor will check the health of your ReplicaSet
Metrics are :
1: The replicaSet is OK
0: The replicaSet is KO
This monitor will trigger an alert for each ReplicaSet.

View File

@ -24,26 +24,113 @@ variable "filter_tags_custom" {
default = "*" default = "*"
} }
variable "mongodb_replicaset_silenced" { variable "mongodb_desired_servers_count" {
description = "Groups to mute for Mongodb replicaset monitor" description = "Number of servers that should be instanciated for this cluster"
default = 3
}
variable "mongodb_primary_timeframe" {
description = "Monitor timeframe for MongoDB wrong state for primary node [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
type = "string"
default = "last_1m"
}
variable "mongodb_secondary_timeframe" {
description = "Monitor timeframe for MongoDB wrong state for secondaries nodes [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
type = "string"
default = "last_5m"
}
variable "mongodb_server_count_timeframe" {
description = "Monitor timeframe for MongoDB wrong server count [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
type = "string"
default = "last_15m"
}
variable "mongodb_replication_timeframe" {
description = "Monitor timeframe for MongoDB replication lag [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
type = "string"
default = "last_1m"
}
variable "mongodb_lag_warning" {
description = "Warn replication lag in s"
default = 2
}
variable "mongodb_lag_critical" {
description = "Critical replication lag in s"
default = 5
}
variable "mongodb_primary_silenced" {
description = "Groups to mute for MongoDB primary state monitor"
type = "map" type = "map"
default = {} default = {}
} }
variable "mongodb_replicaset_message" { variable "mongodb_secondary_silenced" {
description = "Custom message for Mongodb replicaset monitor" description = "Groups to mute for MongoDB secondary state monitor"
type = "map"
default = {}
}
variable "mongodb_server_count_silenced" {
description = "Groups to mute for MongoDB server count monitor"
type = "map"
default = {}
}
variable "mongodb_replication_silenced" {
description = "Groups to mute for MongoDB replication lag monitor"
type = "map"
default = {}
}
variable "mongodb_primary_message" {
description = "Custom message for MongoDB primary monitor"
type = "string" type = "string"
default = "" default = ""
} }
variable "mongodb_replicaset_time_aggregator" { variable "mongodb_secondary_message" {
description = "Monitor aggregator for Mongodb replicaset [available values: min, max or avg]" description = "Custom message for MongoDB secondary monitor"
type = "string"
default = ""
}
variable "mongodb_server_count_message" {
description = "Custom message for MongoDB server count"
type = "string"
default = ""
}
variable "mongodb_replication_message" {
description = "Custom message for MongoDB replication monitor"
type = "string"
default = ""
}
variable "mongodb_primary_aggregator" {
description = "Monitor aggregator for MongoDB primary state [available values: min, max]"
type = "string" type = "string"
default = "max" default = "max"
} }
variable "mongodb_replicaset_timeframe" { variable "mongodb_secondary_aggregator" {
description = "Monitor timeframe for Mongodb replicaset [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]" description = "Monitor aggregator for MongoDB secondary state [available values: min, max]"
type = "string" type = "string"
default = "last_5m" default = "max"
}
variable "mongodb_server_count_aggregator" {
description = "Monitor aggregator for MongoDB server count [available values: min, max]"
type = "string"
default = "min"
}
variable "mongodb_replication_aggregator" {
description = "Monitor aggregator for MongoDB replication lag [available values: min, max, sum or avg]"
type = "string"
default = "avg"
} }

View File

@ -2,18 +2,17 @@ data "template_file" "filter" {
template = "$${filter}" template = "$${filter}"
vars { vars {
filter = "${var.filter_tags_use_defaults == "true" ? format("dd_monitoring:enabled,dd_monitoring_mongodb:enabled,env:%s", var.environment) : "${var.filter_tags_custom}"}" filter = "${var.filter_tags_use_defaults == "true" ? format("dd_monitoring:enabled,dd_mongodb:enabled,env:%s", var.environment) : "${var.filter_tags_custom}"}"
} }
} }
resource "datadog_monitor" "mongodb_replicaset_state" { resource "datadog_monitor" "mongodb_primary" {
name = "[${var.environment}] Member down in the replica set" name = "[${var.environment}] MongoDB primary state"
message = "${coalesce(var.mongodb_replicaset_message, var.message)}" message = "${coalesce(var.mongodb_primary_message, var.message)}"
query = <<EOF query = <<EOF
${var.mongodb_replicaset_time_aggregator}(${var.mongodb_replicaset_timeframe}): ( ${var.mongodb_primary_aggregator}(${var.mongodb_primary_timeframe}):
avg:mongodb.replset.health{${data.template_file.filter.rendered}} by {region,replset_name} min:mongodb.replset.state{${data.template_file.filter.rendered}} by {replset_name} >= 2
) < 1
EOF EOF
type = "metric alert" type = "metric alert"
@ -27,7 +26,100 @@ resource "datadog_monitor" "mongodb_replicaset_state" {
include_tags = true include_tags = true
require_full_window = true require_full_window = true
silenced = "${var.mongodb_replicaset_silenced}" silenced = "${var.mongodb_primary_silenced}"
tags = ["env:${var.environment}", "resource:mongodb"]
}
resource "datadog_monitor" "mongodb_secondary" {
name = "[${var.environment}] MongoDB secondary missing"
message = "${coalesce(var.mongodb_secondary_message, var.message)}"
query = <<EOF
${var.mongodb_secondary_aggregator}(${var.mongodb_secondary_timeframe}):
${var.mongodb_desired_servers_count} -
sum:mongodb.replset.health{${data.template_file.filter.rendered}} by {replset_name}
> 1
EOF
thresholds {
critical = 1
warning = 0
}
type = "metric alert"
notify_no_data = false
renotify_interval = 0
evaluation_delay = "${var.delay}"
new_host_delay = "${var.delay}"
notify_audit = false
timeout_h = 0
include_tags = true
require_full_window = true
silenced = "${var.mongodb_secondary_silenced}"
tags = ["env:${var.environment}", "resource:mongodb"]
}
resource "datadog_monitor" "mongodb_server_count" {
name = "[${var.environment}] MongoDB too much servers or wrong monitoring config"
message = "${coalesce(var.mongodb_server_count_message, var.message)}"
query = <<EOF
${var.mongodb_server_count_aggregator}(${var.mongodb_server_count_timeframe}):
sum:mongodb.replset.health{${data.template_file.filter.rendered}} by {replset_name}
> 99
EOF
thresholds {
critical = 99
warning = "${var.mongodb_desired_servers_count}"
}
type = "metric alert"
notify_no_data = false
renotify_interval = 0
evaluation_delay = "${var.delay}"
new_host_delay = "${var.delay}"
notify_audit = false
timeout_h = 0
include_tags = true
require_full_window = true
silenced = "${var.mongodb_secondary_silenced}"
tags = ["env:${var.environment}", "resource:mongodb"]
}
resource "datadog_monitor" "mongodb_replication" {
name = "[${var.environment}] MongoDB replication lag"
message = "${coalesce(var.mongodb_replication_message, var.message)}"
query = <<EOF
${var.mongodb_replication_aggregator}(${var.mongodb_replication_timeframe}):
avg:mongodb.replset.replicationlag{${data.template_file.filter.rendered},replset_state:secondary} by {server} > ${var.mongodb_lag_critical}
EOF
thresholds {
critical = "${var.mongodb_lag_critical}"
warning = "${var.mongodb_lag_warning}"
}
type = "metric alert"
notify_no_data = false
renotify_interval = 0
evaluation_delay = "${var.delay}"
new_host_delay = "${var.delay}"
notify_audit = false
timeout_h = 0
include_tags = true
require_full_window = true
silenced = "${var.mongodb_replication_silenced}"
tags = ["env:${var.environment}", "resource:mongodb"] tags = ["env:${var.environment}", "resource:mongodb"]
} }

View File

@ -1,4 +1,19 @@
output "mongodb_replicaset_state_id" { output "mongodb_primary_id" {
description = "id for monitor mongodb_replicaset_state" description = "id for monitor mongodb_primary"
value = "${datadog_monitor.mongodb_replicaset_state.id}" value = "${datadog_monitor.mongodb_primary.id}"
}
output "mongodb_secondary_id" {
description = "id for monitor mongodb_secondary"
value = "${datadog_monitor.mongodb_secondary.id}"
}
output "mongodb_server_count_id" {
description = "id for monitor mongodb_server_count"
value = "${datadog_monitor.mongodb_server_count.id}"
}
output "mongodb_replication_id" {
description = "id for monitor mongodb_replication"
value = "${datadog_monitor.mongodb_replication.id}"
} }