• configuration 配置
    • 配置文件
      • 示例

    configuration 配置


    Alertmanager通过命令行和一个配置文件配置。命令行配置不可变的系统参数,而配置文件定义inhibiton规则,通知路由和通知接收者。

    可视化编辑器可以帮助构建路由树。

    如果想要查看所有命令,请使用命令alertmanager -h

    Alertmanager能够在运行时动态加载配置文件。如果新的配置有错误,则配置中的变化不会生效,同时错误日志被输出到终端。通过发送SIGHUP信号量给这个进程,或者通过HTTP POST请求/-/reload,Alertmanager配置动态加载到内存。

    配置文件

    使用-config.file指定要加载的配置文件

    ./alertmanager -config.file=simple.yml

    这个配置文件使用yaml格式编写的,括号表示参数是可选的,对于非列表参数,该值将设置为指定的默认值。

    • <duration>: 与正则表达式匹配的持续时间[0-9]+(ms|[smhdwy])
    • <labeltime>: 与正则表达式匹配的字符串[a-zA-Z_][a-zA-Z0-9_]*
    • <filepath>: 当前工作目录下的有效路径
    • <boolean>: 布尔值: false 或者 true
    • <string>: 常规字符串
    • <tmpl_string>: 一个在使用前被模板扩展的字符串

    其他占位符被分开指定, 一个有效的示例,点击这里

    全局配置指定的参数在所有其他上下文配置中是有效的。它们也作为其他区域的默认值。

    1. global:
    2. # ResolveTimeout is the time after which an alert is declared resolved
    3. # if it has not been updated.
    4. [ resolve_timeout: <duration> | default = 5m ]
    5. # The default SMTP From header field.
    6. [ smtp_from: <tmpl_string> ]
    7. # The default SMTP smarthost used for sending emails.
    8. [ smtp_smarthost: <string> ]
    9. # SMTP authentication information.
    10. [ smtp_auth_username: <string> ]
    11. [ smtp_auth_password: <string> ]
    12. [ smtp_auth_secret: <string> ]
    13. # The default SMTP TLS requirement.
    14. [ smtp_require_tls: <bool> | default = true ]
    15. # The API URL to use for Slack notifications.
    16. [ slack_api_url: <string> ]
    17. [ pagerduty_url: <string> | default = "https://events.pagerduty.com/generic/2010-04-15/create_event.json" ]
    18. [ opsgenie_api_host: <string> | default = "https://api.opsgenie.com/" ]
    19. [ hipchat_url: <string> | default = "https://api.hipchat.com/" ]
    20. [ hipchat_auth_token: <string> ]
    21. # Files from which custom notification template definitions are read.
    22. # The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'.
    23. templates:
    24. [ - <filepath> ... ]
    25. # The root node of the routing tree.
    26. route: <route>
    27. # A list of notification receivers.
    28. receivers:
    29. - <receiver> ...
    30. # A list of inhibition rules.
    31. inhibit_rules:
    32. [ - <inhibit_rule> ... ]

    一个路由块在路由树和它的孩子中定义了一个节点。如果不设置,它的可选配置参数从父节点中继承其值。

    每个警报在已配置路由树的顶部节点,这个节点必须匹配所有警报。然后遍历所有的子节点。如果continue设置成false, 当匹配到第一个孩子时,它会停止下来;如果continue设置成true, 则警报将继续匹配后续的兄弟姐妹节点。如果一个警报不匹配一个节点的任何孩子,这个警报将会基于当前节点的配置参数来处理警报。

    1. [ receiver: <string> ]
    2. [ group_by: '[' <labelname>, ... ']' ]
    3. # Whether an alert should continue matching subsequent sibling nodes.
    4. [ continue: <boolean> | default = false ]
    5. # A set of equality matchers an alert has to fulfill to match the node.
    6. match:
    7. [ <labelname>: <labelvalue>, ... ]
    8. # A set of regex-matchers an alert has to fulfill to match the node.
    9. match_re:
    10. [ <labelname>: <regex>, ... ]
    11. # How long to initially wait to send a notification for a group
    12. # of alerts. Allows to wait for an inhibiting alert to arrive or collect
    13. # more initial alerts for the same group. (Usually ~0s to few minutes.)
    14. [ group_wait: <duration> ]
    15. # How long to wait before sending notification about new alerts that are
    16. # in are added to a group of alerts for which an initial notification
    17. # has already been sent. (Usually ~5min or more.)
    18. [ group_interval: <duration> ]
    19. # How long to wait before sending a notification again if it has already
    20. # been sent successfully for an alert. (Usually ~3h or more).
    21. [ repeat_interval: <duration> ]
    22. # Zero or more child routes.
    23. routes:
    24. [ - <route> ... ]

    示例

    1. # The root route with all parameters, which are inherited by the child
    2. # routes if they are not overwritten.
    3. route:
    4. receiver: 'default-receiver'
    5. group_wait: 30s
    6. group_interval: 5m
    7. repeat_interval: 4h
    8. group_by: [cluster, alertname]
    9. # All alerts that do not match the following child routes
    10. # will remain at the root node and be dispatched to 'default-receiver'.
    11. routes:
    12. # All alerts with service=mysql or service=cassandra
    13. # are dispatched to the database pager.
    14. - receiver: 'database-pager'
    15. group_wait: 10s
    16. match_re:
    17. service: mysql|cassandra
    18. # All alerts with the team=frontend label match this sub-route.
    19. # They are grouped by product and environment rather than cluster
    20. # and alertname.
    21. - receiver: 'frontend-pager'
    22. group_by: [product, environment]
    23. match:
    24. team: frontend

    一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。

    1. # Matchers that have to be fulfilled in the alerts to be muted.
    2. target_match:
    3. [ <labelname>: <labelvalue>, ... ]
    4. target_match_re:
    5. [ <labelname>: <regex>, ... ]
    6. # Matchers for which one or more alerts have to exist for the
    7. # inhibition to take effect.
    8. source_match:
    9. [ <labelname>: <labelvalue>, ... ]
    10. source_match_re:
    11. [ <labelname>: <regex>, ... ]
    12. # Labels that must have an equal value in the source and target
    13. # alert for the inhibition to take effect.
    14. [ equal: '[' <labelname>, ... ']' ]

    接收者是一个或者多个通知集成的命名配置

    Alertmanager在v0.0.4中可用的其他接收器尚未实现。我们乐意接受任何贡献,并将其添加到新的实现中

    1. # The unique name of the receiver.
    2. name: <string>
    3. # Configurations for several notification integrations.
    4. email_configs:
    5. [ - <email_config>, ... ]
    6. hipchat_configs:
    7. [ - <hipchat_config>, ... ]
    8. pagerduty_configs:
    9. [ - <pagerduty_config>, ... ]
    10. pushover_configs:
    11. [ - <pushover_config>, ... ]
    12. slack_configs:
    13. [ - <slack_config>, ... ]
    14. opsgenie_configs:
    15. [ - <opsgenie_config>, ... ]
    16. webhook_configs:
    17. [ - <webhook_config>, ... ]

    1. # Whether or not to notify about resolved alerts.
    2. [ send_resolved: <boolean> | default = false ]
    3. # The email address to send notifications to.
    4. to: <tmpl_string>
    5. # The sender address.
    6. [ from: <tmpl_string> | default = global.smtp_from ]
    7. # The SMTP host through which emails are sent.
    8. [ smarthost: <string> | default = global.smtp_smarthost ]
    9. # SMTP authentication information.
    10. [ auth_username: <string> ]
    11. [ auth_password: <string> ]
    12. [ auth_secret: <string> ]
    13. [ auth_identity: <string> ]
    14. [ require_tls: <bool> | default = global.smtp_require_tls ]
    15. # The HTML body of the email notification.
    16. [ html: <tmpl_string> | default = '{{ template "email.default.html" . }}' ]
    17. # Further headers email header key/value pairs. Overrides any headers
    18. # previously set by the notification implementation.
    19. [ headers: { <string>: <tmpl_string>, ... } ]

    1. # Whether or not to notify about resolved alerts.
    2. [ send_resolved: <boolean> | default = false ]
    3. # The HipChat Room ID.
    4. room_id: <tmpl_string>
    5. # The auth token.
    6. [ auth_token: <string> | default = global.hipchat_auth_token ]
    7. # The URL to send API requests to.
    8. [ url: <string> | default = global.hipchat_url ]
    9. # See https://www.hipchat.com/docs/apiv2/method/send_room_notification
    10. # A label to be shown in addition to the sender's name.
    11. [ from: <tmpl_string> | default = '{{ template "hipchat.default.from" . }}' ]
    12. # The message body.
    13. [ message: <tmpl_string> | default = '{{ template "hipchat.default.message" . }}' ]
    14. # Whether this message should trigger a user notification.
    15. [ notify: <boolean> | default = false ]
    16. # Determines how the message is treated by the alertmanager and rendered inside HipChat. Valid values are 'text' and 'html'.
    17. [ message_format: <string> | default = 'text' ]
    18. # Background color for message.
    19. [ color: <tmpl_string> | default = '{{ if eq .Status "firing" }}red{{ else }}green{{ end }}' ]

    通过PagerDuty ApI发送通知:

    1. # Whether or not to notify about resolved alerts.
    2. [ send_resolved: <boolean> | default = true ]
    3. # The PagerDuty service key.
    4. service_key: <tmpl_string>
    5. # The URL to send API requests to
    6. [ url: <string> | default = global.pagerduty_url ]
    7. # The client identification of the Alertmanager.
    8. [ client: <tmpl_string> | default = '{{ template "pagerduty.default.client" . }}' ]
    9. # A backlink to the sender of the notification.
    10. [ client_url: <tmpl_string> | default = '{{ template "pagerduty.default.clientURL" . }}' ]
    11. # A description of the incident.
    12. [ description: <tmpl_string> | default = '{{ template "pagerduty.default.description" .}}' ]
    13. # A set of arbitrary key/value pairs that provide further detail
    14. # about the incident.
    15. [ details: { <string>: <tmpl_string>, ... } | default = {
    16. firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
    17. resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
    18. num_firing: '{{ .Alerts.Firing | len }}'
    19. num_resolved: '{{ .Alerts.Resolved | len }}'
    20. } ]

    通过PUSHover API发送通知:

    1. # The recipient user’s user key.
    2. user_key: <string>
    3. # Your registered application’s API token, see https://pushover.net/apps
    4. token: <string>
    5. # Notification title.
    6. [ title: <tmpl_string> | default = '{{ template "pushover.default.title" . }}' ]
    7. # Notification message.
    8. [ message: <tmpl_string> | default = '{{ template "pushover.default.message" . }}' ]
    9. # A supplementary URL shown alongside the message.
    10. [ url: <tmpl_string> | default = '{{ template "pushover.default.url" . }}' ]
    11. # Priority, see https://pushover.net/api#priority
    12. [ priority: <tmpl_string> | default = '{{ if eq .Status "firing" }}2{{ else }}0{{ end }}' ]
    13. # How often the Pushover servers will send the same notification to the user.
    14. # Must be at least 30 seconds.
    15. [ retry: <duration> | default = 1m ]
    16. # How long your notification will continue to be retried for, unless the user
    17. # acknowledges the notification.
    18. [ expire: <duration> | default = 1h ]

    通过Slack webhooks发送通知:

    1. # Whether or not to notify about resolved alerts.
    2. [ send_resolved: <boolean> | default = false ]
    3. # The Slack webhook URL.
    4. [ api_url: <string> | default = global.slack_api_url ]
    5. # The channel or user to send notifications to.
    6. channel: <tmpl_string>
    7. # API request data as defined by the Slack webhook API.
    8. [ color: <tmpl_string> | default = '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' ]
    9. [ username: <tmpl_string> | default = '{{ template "slack.default.username" . }}'
    10. [ title: <tmpl_string> | default = '{{ template "slack.default.title" . }}' ]
    11. [ title_link: <tmpl_string> | default = '{{ template "slack.default.titlelink" . }}' ]
    12. [ icon_emoji: <tmpl_string> ]
    13. [ icon_url: <tmpl_string> ]
    14. [ pretext: <tmpl_string> | default = '{{ template "slack.default.pretext" . }}' ]
    15. [ text: <tmpl_string> | default = '{{ template "slack.default.text" . }}' ]
    16. [ fallback: <tmpl_string> | default = '{{ template "slack.default.fallback" . }}' ]

    通过OpsGenie API发送通知:

    1. # Whether or not to notify about resolved alerts.
    2. [ send_resolved: <boolean> | default = true ]
    3. # The API key to use when talking to the OpsGenie API.
    4. api_key: <string>
    5. # The host to send OpsGenie API requests to.
    6. [ api_host: <string> | default = global.opsgenie_api_host ]
    7. # A description of the incident.
    8. [ description: <tmpl_string> | default = '{{ template "opsgenie.default.description" . }}' ]
    9. # A backlink to the sender of the notification.
    10. [ source: <tmpl_string> | default = '{{ template "opsgenie.default.source" . }}' ]
    11. # A set of arbitrary key/value pairs that provide further detail
    12. # about the incident.
    13. [ details: { <string>: <tmpl_string>, ... } ]
    14. # Comma separated list of team responsible for notifications.
    15. [ teams: <tmpl_string> ]
    16. # Comma separated list of tags attached to the notifications.
    17. [ tags: <tmpl_string> ]

    webhook接收者允许配置一个通用的接收者

    1. # Whether or not to notify about resolved alerts.
    2. [ send_resolved: <boolean> | default = true ]
    3. # The endpoint to send HTTP POST requests to.
    4. url: <string>

    Alertmanager通过HTTP POST请求发送json格式的数据到配置端点:

    1. {
    2. "version": "3",
    3. "groupKey": <number> // key identifying the group of alerts (e.g. to deduplicate)
    4. "status": "<resolved|firing>",
    5. "receiver": <string>,
    6. "groupLabels": <object>,
    7. "commonLabels": <object>,
    8. "commonAnnotations": <object>,
    9. "externalURL": <string>, // backling to the Alertmanager.
    10. "alerts": [
    11. {
    12. "labels": <object>,
    13. "annotations": <object>,
    14. "startsAt": "<rfc3339>",
    15. "endsAt": "<rfc3339>"
    16. },
    17. ...
    18. ]
    19. }