• 使用Operator管理Prometheus
    • 创建Prometheus实例
    • 使用ServiceMonitor管理监控配置
    • 关联Promethues与ServiceMonitor
    • 自定义ServiceAccount

    使用Operator管理Prometheus

    创建Prometheus实例

    当集群中已经安装Prometheus Operator之后,对于部署Prometheus Server实例就变成了声明一个Prometheus资源,如下所示,我们在Monitoring命名空间下创建一个Prometheus实例:

    1. apiVersion: monitoring.coreos.com/v1
    2. kind: Prometheus
    3. metadata:
    4. name: inst
    5. namespace: monitoring
    6. spec:
    7. resources:
    8. requests:
    9. memory: 400Mi

    将以上内容保存到prometheus-inst.yaml文件,并通过kubectl进行创建:

    1. $ kubectl create -f prometheus-inst.yaml
    2. prometheus.monitoring.coreos.com/inst-1 created

    此时,查看default命名空间下的statefulsets资源,可以看到Prometheus Operator自动通过Statefulset创建的Prometheus实例:

    1. $ kubectl -n monitoring get statefulsets
    2. NAME DESIRED CURRENT AGE
    3. prometheus-inst 1 1 1m

    查看Pod实例:

    1. $ kubectl -n monitoring get pods
    2. NAME READY STATUS RESTARTS AGE
    3. prometheus-inst-0 3/3 Running 1 1m
    4. prometheus-operator-6db8dbb7dd-2hz55 1/1 Running 0 45m

    通过port-forward访问Prometheus实例:

    1. $ kubectl -n monitoring port-forward statefulsets/prometheus-inst 9090:9090

    通过http://localhost:9090可以在本地直接打开Prometheus Operator创建的Prometheus实例。查看配置信息,可以看到目前Operator创建了只包含基本配置的Prometheus实例:

    使用Operator管理Prometheus - 图1

    使用ServiceMonitor管理监控配置

    修改监控配置项也是Prometheus下常用的运维操作之一,为了能够自动化的管理Prometheus的配置,Prometheus Operator使用了自定义资源类型ServiceMonitor来描述监控对象的信息。

    这里我们首先在集群中部署一个示例应用,将以下内容保存到example-app.yaml,并使用kubectl命令行工具创建:

    1. kind: Service
    2. apiVersion: v1
    3. metadata:
    4. name: example-app
    5. labels:
    6. app: example-app
    7. spec:
    8. selector:
    9. app: example-app
    10. ports:
    11. - name: web
    12. port: 8080
    13. ---
    14. apiVersion: extensions/v1beta1
    15. kind: Deployment
    16. metadata:
    17. name: example-app
    18. spec:
    19. replicas: 3
    20. template:
    21. metadata:
    22. labels:
    23. app: example-app
    24. spec:
    25. containers:
    26. - name: example-app
    27. image: fabxc/instrumented_app
    28. ports:
    29. - name: web
    30. containerPort: 8080

    示例应用会通过Deployment创建3个Pod实例,并且通过Service暴露应用访问信息。

    1. $ kubectl get pods
    2. NAME READY STATUS RESTARTS AGE
    3. example-app-94c8bc8-l27vx 2/2 Running 0 1m
    4. example-app-94c8bc8-lcsrm 2/2 Running 0 1m
    5. example-app-94c8bc8-n6wp5 2/2 Running 0 1m

    在本地同样通过port-forward访问任意Pod实例

    1. $ kubectl port-forward deployments/example-app 8080:8080

    访问本地的http://localhost:8080/metrics实例应用程序会返回以下样本数据:

    1. # TYPE codelab_api_http_requests_in_progress gauge
    2. codelab_api_http_requests_in_progress 3
    3. # HELP codelab_api_request_duration_seconds A histogram of the API HTTP request durations in seconds.
    4. # TYPE codelab_api_request_duration_seconds histogram
    5. codelab_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0001"} 0

    为了能够让Prometheus能够采集部署在Kubernetes下应用的监控数据,在原生的Prometheus配置方式中,我们在Prometheus配置文件中定义单独的Job,同时使用kubernetes_sd定义整个服务发现过程。而在Prometheus Operator中,则可以直接生命一个ServiceMonitor对象,如下所示:

    1. apiVersion: monitoring.coreos.com/v1
    2. kind: ServiceMonitor
    3. metadata:
    4. name: example-app
    5. namespace: monitoring
    6. labels:
    7. team: frontend
    8. spec:
    9. namespaceSelector:
    10. matchNames:
    11. - default
    12. selector:
    13. matchLabels:
    14. app: example-app
    15. endpoints:
    16. - port: web

    通过定义selector中的标签定义选择监控目标的Pod对象,同时在endpoints中指定port名称为web的端口。默认情况下ServiceMonitor和监控对象必须是在相同Namespace下的。在本示例中由于Prometheus是部署在Monitoring命名空间下,因此为了能够关联default命名空间下的example对象,需要使用namespaceSelector定义让其可以跨命名空间关联ServiceMonitor资源。保存以上内容到example-app-service-monitor.yaml文件中,并通过kubectl创建:

    1. $ kubectl create -f example-app-service-monitor.yaml
    2. servicemonitor.monitoring.coreos.com/example-app created

    如果希望ServiceMonitor可以关联任意命名空间下的标签,则通过以下方式定义:

    1. spec:
    2. namespaceSelector:
    3. any: true

    如果监控的Target对象启用了BasicAuth认证,那在定义ServiceMonitor对象时,可以使用endpoints配置中定义basicAuth如下所示:

    1. apiVersion: monitoring.coreos.com/v1
    2. kind: ServiceMonitor
    3. metadata:
    4. name: example-app
    5. namespace: monitoring
    6. labels:
    7. team: frontend
    8. spec:
    9. namespaceSelector:
    10. matchNames:
    11. - default
    12. selector:
    13. matchLabels:
    14. app: example-app
    15. endpoints:
    16. - basicAuth:
    17. password:
    18. name: basic-auth
    19. key: password
    20. username:
    21. name: basic-auth
    22. key: user
    23. port: web

    其中basicAuth中关联了名为basic-auth的Secret对象,用户需要手动将认证信息保存到Secret中:

    1. apiVersion: v1
    2. kind: Secret
    3. metadata:
    4. name: basic-auth
    5. data:
    6. password: dG9vcg== # base64编码后的密码
    7. user: YWRtaW4= # base64编码后的用户名
    8. type: Opaque

    关联Promethues与ServiceMonitor

    Prometheus与ServiceMonitor之间的关联关系使用serviceMonitorSelector定义,在Prometheus中通过标签选择当前需要监控的ServiceMonitor对象。修改prometheus-inst.yaml中Prometheus的定义如下所示:
    为了能够让Prometheus关联到ServiceMonitor,需要在Pormtheus定义中使用serviceMonitorSelector,我们可以通过标签选择当前Prometheus需要监控的ServiceMonitor对象。修改prometheus-inst.yaml中Prometheus的定义如下所示:

    1. apiVersion: monitoring.coreos.com/v1
    2. kind: Prometheus
    3. metadata:
    4. name: inst
    5. namespace: monitoring
    6. spec:
    7. serviceMonitorSelector:
    8. matchLabels:
    9. team: frontend
    10. resources:
    11. requests:
    12. memory: 400Mi

    将对Prometheus的变更应用到集群中:

    1. $ kubectl -n monitoring apply -f prometheus-inst.yaml

    此时,如果查看Prometheus配置信息,我们会惊喜的发现Prometheus中配置文件自动包含了一条名为monitoring/example-app/0的Job配置:

    1. global:
    2. scrape_interval: 30s
    3. scrape_timeout: 10s
    4. evaluation_interval: 30s
    5. external_labels:
    6. prometheus: monitoring/inst
    7. prometheus_replica: prometheus-inst-0
    8. alerting:
    9. alert_relabel_configs:
    10. - separator: ;
    11. regex: prometheus_replica
    12. replacement: $1
    13. action: labeldrop
    14. rule_files:
    15. - /etc/prometheus/rules/prometheus-inst-rulefiles-0/*.yaml
    16. scrape_configs:
    17. - job_name: monitoring/example-app/0
    18. scrape_interval: 30s
    19. scrape_timeout: 10s
    20. metrics_path: /metrics
    21. scheme: http
    22. kubernetes_sd_configs:
    23. - role: endpoints
    24. namespaces:
    25. names:
    26. - default
    27. relabel_configs:
    28. - source_labels: [__meta_kubernetes_service_label_app]
    29. separator: ;
    30. regex: example-app
    31. replacement: $1
    32. action: keep
    33. - source_labels: [__meta_kubernetes_endpoint_port_name]
    34. separator: ;
    35. regex: web
    36. replacement: $1
    37. action: keep
    38. - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    39. separator: ;
    40. regex: Node;(.*)
    41. target_label: node
    42. replacement: ${1}
    43. action: replace
    44. - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    45. separator: ;
    46. regex: Pod;(.*)
    47. target_label: pod
    48. replacement: ${1}
    49. action: replace
    50. - source_labels: [__meta_kubernetes_namespace]
    51. separator: ;
    52. regex: (.*)
    53. target_label: namespace
    54. replacement: $1
    55. action: replace
    56. - source_labels: [__meta_kubernetes_service_name]
    57. separator: ;
    58. regex: (.*)
    59. target_label: service
    60. replacement: $1
    61. action: replace
    62. - source_labels: [__meta_kubernetes_pod_name]
    63. separator: ;
    64. regex: (.*)
    65. target_label: pod
    66. replacement: $1
    67. action: replace
    68. - source_labels: [__meta_kubernetes_service_name]
    69. separator: ;
    70. regex: (.*)
    71. target_label: job
    72. replacement: ${1}
    73. action: replace
    74. - separator: ;
    75. regex: (.*)
    76. target_label: endpoint
    77. replacement: web
    78. action: replace

    不过,如果细心的读者可能会发现,虽然Job配置有了,但是Prometheus的Target中并没包含任何的监控对象。查看Prometheus的Pod实例日志,可以看到如下信息:

    1. level=error ts=2018-12-15T12:52:48.452108433Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:default\" cannot list endpoints in the namespace \"default\""

    自定义ServiceAccount

    由于默认创建的Prometheus实例使用的是monitoring命名空间下的default账号,该账号并没有权限能够获取default命名空间下的任何资源信息。

    为了修复这个问题,我们需要在Monitoring命名空间下为创建一个名为Prometheus的ServiceAccount,并且为该账号赋予相应的集群访问权限。

    1. apiVersion: v1
    2. kind: ServiceAccount
    3. metadata:
    4. name: prometheus
    5. namespace: monitoring
    6. ---
    7. apiVersion: rbac.authorization.k8s.io/v1beta1
    8. kind: ClusterRole
    9. metadata:
    10. name: prometheus
    11. rules:
    12. - apiGroups: [""]
    13. resources:
    14. - nodes
    15. - services
    16. - endpoints
    17. - pods
    18. verbs: ["get", "list", "watch"]
    19. - apiGroups: [""]
    20. resources:
    21. - configmaps
    22. verbs: ["get"]
    23. - nonResourceURLs: ["/metrics"]
    24. verbs: ["get"]
    25. ---
    26. apiVersion: rbac.authorization.k8s.io/v1beta1
    27. kind: ClusterRoleBinding
    28. metadata:
    29. name: prometheus
    30. roleRef:
    31. apiGroup: rbac.authorization.k8s.io
    32. kind: ClusterRole
    33. name: prometheus
    34. subjects:
    35. - kind: ServiceAccount
    36. name: prometheus
    37. namespace: monitoring

    将以上内容保存到prometheus-rbac.yaml文件中,并且通过kubectl创建相应资源:

    1. $ kubectl -n monitoring create -f prometheus-rbac.yaml
    2. serviceaccount/prometheus created
    3. clusterrole.rbac.authorization.k8s.io/prometheus created
    4. clusterrolebinding.rbac.authorization.k8s.io/prometheus created

    在完成ServiceAccount创建后,修改prometheus-inst.yaml,并添加ServiceAccount如下所示:

    1. apiVersion: monitoring.coreos.com/v1
    2. kind: Prometheus
    3. metadata:
    4. name: inst
    5. namespace: monitoring
    6. spec:
    7. serviceAccountName: prometheus
    8. serviceMonitorSelector:
    9. matchLabels:
    10. team: frontend
    11. resources:
    12. requests:
    13. memory: 400Mi

    保存Prometheus变更到集群中:

    1. $ kubectl -n monitoring apply -f prometheus-inst.yaml
    2. prometheus.monitoring.coreos.com/inst configured

    等待Prometheus Operator完成相关配置变更后,此时查看Prometheus,我们就能看到当前Prometheus已经能够正常的采集实例应用的相关监控数据了。