1. 使用Prometheus和Grafana监控kubernetes集群
1 2 3 4 5 6 7 8 9 10 11 通过prometheus-node-exporter采集主机的性能指标数据,并通过暴露的 /metrics 接口用prometheus抓取 通过kube-apiserver、kube-controller-manager、kube-scheduler、etcd、kubelet、kube-proxy自身暴露的 /metrics 获取节点上与k8s集群相关的一些指标数据 通过cadvisor采集容器、Pod相关的性能指标数据,并通过暴露的 /metrics 接口用prometheus抓取 通过blackbox-exporter采集应用的网络性能(http、tcp、icmp等)数据,并通过暴露的 /metrics 接口用prometheus抓取 通过kube-state-metrics采集k8s资源对象的状态指标数据,并通过暴露的 /metrics 接口用prometheus抓取 应用自己采集容器中进程主动暴露的指标数据(暴露指标的功能由应用自己实现,并添加约定的annotation,prometheus负责根据annotation实现抓取)
1.1. 部署kube-state-metrics kube-state-metrics (KSM)是一个简单的服务,它侦听Kubernetes API服务器并生成关于对象状态的度量。(参见下面度量部分中的例子。)它不关注单个Kubernetes组件的运行状况,而是关注内部各种对象(如部署、节点和pod)的运行状况。
1.1.1. 下载源码包 对应版本选择,我这里k8s版本是1.22,因此选择v2.3.0
kube-state-metrics
Kubernetes 1.19
Kubernetes 1.20
Kubernetes 1.21
Kubernetes 1.22
Kubernetes 1.23
v1.9.8
-
-
-
-
-
v2.1.1
✓
✓
✓
-/✓
-/✓
v2.2.4
✓
✓
✓
✓
✓
v2.3.0
✓
✓
✓
✓
✓
master
✓
✓
✓
✓
✓
✓
完全支持的版本范围。
-
Kubernetes集群有一些客户端库不能使用的特性(额外的API对象,废弃的API,等等)。
1.1.1.1. 下载解压 1 2 3 wget https://github.com/kubernetes/kube-state-metrics/archive/refs/tags/v2.3.0.zip unzip v2.3.0.zip cd kube-state-metrics-2.3.0/examples/standard
1.1.1.2. 查看yaml文件 1 2 3 4 5 6 7 [root@harbor k8s-yaml]# ll total 20 -rw-r--r-- 1 root root 418 Dec 9 15:24 cluster-role-binding.yaml -rw-r--r-- 1 root root 1665 Dec 9 15:24 cluster-role.yaml -rw-r--r-- 1 root root 1222 Dec 9 15:24 deployment.yaml -rw-r--r-- 1 root root 234 Dec 9 15:24 service-account.yaml -rw-r--r-- 1 root root 447 Dec 9 15:24 service.yaml
1.1.1.3. 准备镜像 因为镜像需要科学上网才可以下载,这里我做好镜像上传到dockerhub,可直接替换
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 # 拉取镜像 [root@app1 ~]# docker pull k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 v2.3.0: Pulling from kube-state-metrics/kube-state-metrics e8614d09b7be: Pull complete 53ccb90bafd7: Pull complete Digest: sha256:c9137505edaef138cc23479c73e46e9a3ef7ec6225b64789a03609c973b99030 Status: Downloaded newer image for k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 # 查看 [root@app1 ~]# docker images|grep kube-state-metrics k8s.gcr.io/kube-state-metrics/kube-state-metrics v2.3.0 df2bb3f0d0cd 2 weeks ago 38.7MB # 打tag [root@app1 ~]# docker tag df2bb3f0d0cd heyuze/kube-state-metrics:v2.3.0 # 上传dockerhub [root@app1 ~]# docker push heyuze/kube-state-metrics:v2.3.0 The push refers to repository [docker.io/heyuze/kube-state-metrics] cb4962d0d70b: Pushed 6d75f23be3dd: Pushed v2.3.0: digest: sha256:d964b5107fb31e9020db0d3e738ba4e1fc83a242638ee7e0ae78939baaedbe59 size: 739
在deployment.yaml中将镜像替换为==heyuze/kube-state-metrics:v2.3.0==即可。
1.1.2. 资源配置清单 deployment
vim deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.3.0 name: kube-state-metrics namespace: kube-system spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: kube-state-metrics template: metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.3.0 spec: containers: - image: heyuze/kube-state-metrics:v2.3.0 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 timeoutSeconds: 5 name: kube-state-metrics ports: - containerPort: 8080 name: http-metrics - containerPort: 8081 name: telemetry readinessProbe: httpGet: path: / port: 8081 initialDelaySeconds: 5 timeoutSeconds: 5 securityContext: runAsUser: 65534 nodeSelector: kubernetes.io/os: linux serviceAccountName: kube-state-metrics
ClusterRoleBinding
vim cluster-role-binding.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.3.0 name: kube-state-metrics roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-state-metrics subjects: - kind: ServiceAccount name: kube-state-metrics namespace: kube-system
ClusterRole
vim cluster-role.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.3.0 name: kube-state-metrics rules: - apiGroups: - "" resources: - configmaps - secrets - nodes - pods - services - resourcequotas - replicationcontrollers - limitranges - persistentvolumeclaims - persistentvolumes - namespaces - endpoints verbs: - list - watch - apiGroups: - apps resources: - statefulsets - daemonsets - deployments - replicasets verbs: - list - watch - apiGroups: - batch resources: - cronjobs - jobs verbs: - list - watch - apiGroups: - autoscaling resources: - horizontalpodautoscalers verbs: - list - watch - apiGroups: - authentication.k8s.io resources: - tokenreviews verbs: - create - apiGroups: - authorization.k8s.io resources: - subjectaccessreviews verbs: - create - apiGroups: - policy resources: - poddisruptionbudgets verbs: - list - watch - apiGroups: - certificates.k8s.io resources: - certificatesigningrequests verbs: - list - watch - apiGroups: - storage.k8s.io resources: - storageclasses - volumeattachments verbs: - list - watch - apiGroups: - admissionregistration.k8s.io resources: - mutatingwebhookconfigurations - validatingwebhookconfigurations verbs: - list - watch - apiGroups: - networking.k8s.io resources: - networkpolicies - ingresses verbs: - list - watch - apiGroups: - coordination.k8s.io resources: - leases verbs: - list - watch
service-account
vim service-account.yaml
1 2 3 4 5 6 7 8 9 apiVersion: v1 kind: ServiceAccount metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.3.0 name: kube-state-metrics namespace: kube-system
Service
vim service.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.3.0 name: kube-state-metrics namespace: kube-system spec: clusterIP: None ports: - name: http-metrics port: 8080 targetPort: http-metrics - name: telemetry port: 8081 targetPort: telemetry selector: app.kubernetes.io/name: kube-state-metrics
1.1.3. 应用资源配置清单 master机器
1 2 3 4 5 6 [root@k8s-master ~]# kubectl apply -f ./ clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created serviceaccount/kube-state-metrics created service/kube-state-metrics created
检查启动情况
1 2 3 [root@k8s-master ~]# kubectl get pods,svc -n kube-system|grep kube-state-metrics kube-state-metrics-7f8f6fc7fd-qxw8z 1/1 Running 0 87s service/kube-state-metrics ClusterIP None <none> 8080/TCP,8081/TCP 87s
检查是否正常
1 2 [root@k8s-master ~]# curl localhost:8080/healthz ok
1.2. 部署node-exporter 1.2.1. 准备node-exporter镜像 node-exporter官方dockerhub地址 node-expoerer官方github地址
拉取镜像
1 2 3 4 5 6 7 8 [root@harbor ~]# docker pull prom/node-exporter:v1.3.1 v1.3.1: Pulling from prom/node-exporter aa2a8d90b84c: Pull complete b45d31ee2d7f: Pull complete b5db1e299295: Pull complete Digest: sha256:f2269e73124dd0f60a7d19a2ce1264d33d08a985aed0ee6b0b89d0be470592cd Status: Downloaded newer image for prom/node-exporter:v1.3.1 docker.io/prom/node-exporter:v1.3.1
查看拉取的镜像
1 2 3 [root@harbor ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE prom/node-exporter v1.3.1 1dbe0e931976 2 weeks ago 20.9MB
打tag
1 [root@harbor ~]# docker tag 1dbe0e931976 heyuze/node-exporter:v1.3.1
查看打成功的tag
1 2 3 4 [root@harbor ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE heyuze/node-exporter v1.3.1 1dbe0e931976 2 weeks ago 20.9MB prom/node-exporter v1.3.1 1dbe0e931976 2 weeks ago 20.9MB
推送到镜像仓库
1 2 3 4 5 6 [root@harbor ~]# docker push heyuze/node-exporter:v1.3.1 The push refers to repository [docker.io/heyuze/node-exporter] 5f6d9bc8e23d: Mounted from prom/node-exporter 8d42cad20cac: Mounted from prom/node-exporter 36b45d63da70: Mounted from prom/node-exporter v1.3.1: digest: sha256:d5b2a2e2bb07a4a5a7c4bd9e54641cab63e1d2627622dbde17efc04849d3d30d size: 948
1.2.2. 准备资源配置清单 vim /data/k8s-yaml/node-exporter/node-exporter-ds.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 kind: DaemonSet apiVersion: apps/v1 metadata: name: node-exporter namespace: kube-system labels: daemon: "node-exporter" grafanak8sapp: "true" spec: selector: matchLabels: daemon: "node-exporter" grafanak8sapp: "true" template: metadata: name: node-exporter labels: daemon: "node-exporter" grafanak8sapp: "true" spec: volumes: - name: proc hostPath: path: /proc type: "" - name: sys hostPath: path: /sys type: "" imagePullSecrets: - name: registry-pull-secret containers: - name: node-exporter image: heyuze/node-exporter:v1.3.1 imagePullPolicy: IfNotPresent args: - --path.procfs=/host_proc - --path.sysfs=/host_sys ports: - name: node-exporter hostPort: 9100 containerPort: 9100 protocol: TCP volumeMounts: - name: sys readOnly: true mountPath: /host_sys - name: proc readOnly: true mountPath: /host_proc hostNetwork: true
1.2.3. 应用资源配置清单 1 2 [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/node-exporter/node-exporter-ds.yaml daemonset.apps/node-exporter created
检查启动情况
1 2 3 [root@k8s-master1 ~]# kubectl get pod -n kube-system|grep node-exporter node-exporter-rh7fx 1/1 Running 0 40s node-exporter-vgnzt 1/1 Running 0 40s
健康监控状况
1 [root@k8s-node1 ~]# curl localhost:9100/metrics
只要可以获取到节点数据就表示正常
1.3. 部署cadvisor cAdvisor对Node机器上的资源及容器进行实时监控和性能数据采集,包括CPU使用情况、内存使用情况、网络吞吐量及文件系统使用情况,cAdvisor集成在Kubelet中,当kubelet启动时会自动启动cAdvisor,即一个cAdvisor仅对一台Node机器进行监控。kubelet的启动参数–cadvisor-port可以定义cAdvisor对外提供服务的端口,默认为4194。可以通过浏览器访问。
1.3.1. 准备cadvisor镜像 cadvisor官方dockerhub地址 cadvisor官方github地址
cadvisor官方gcr地址
由于google已经不在dockerhub更新cadvisor镜像,最新的镜像都更新到gcr.io/cadvisor/cadvisor
,我这里下载后上传到dockerhub,修改镜像地址即可。
拉取镜像
1 2 3 4 5 6 7 8 9 10 [root@harbor harbor]# docker pull gcr.io/cadvisor/cadvisor:v0.43.0 v0.43.0: Pulling from cadvisor/cadvisor e519532ddf75: Pull complete 2e08db3b6bd0: Pull complete 83f705f3387b: Pull complete 7f10f7c55689: Pull complete 3fdbcd5b103f: Pull complete Digest: sha256:89e6137f068ded2e9a3a012ce71260b9afc57a19305842aa1074239841a539a7 Status: Downloaded newer image for gcr.io/cadvisor/cadvisor:v0.43.0 gcr.io/cadvisor/cadvisor:v0.43.0
查看镜像
1 2 3 [root@harbor harbor]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE gcr.io/cadvisor/cadvisor v0.43.0 80f16aa8c3c8 6 weeks ago 87.5MB
打tag
1 [root@harbor harbor]# docker tag 80f16aa8c3c8 heyuze/cadvisor:v0.43.0
推送
1 2 3 4 5 6 7 8 [root@harbor harbor]# docker push heyuze/cadvisor:v0.43.0 The push refers to repository [docker.io/heyuze/cadvisor] f2485927f8bd: Pushed 571a7fddbc78: Pushed f1e964b32d2a: Pushed 41768b6793f5: Pushed e6688e911f15: Pushed v0.43.0: digest: sha256:89e6137f068ded2e9a3a012ce71260b9afc57a19305842aa1074239841a539a7 size: 1373
1.3.2. 准备资源配置清单 vi /data/k8s-yaml/cadvisor/daemonset.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 apiVersion: apps/v1 kind: DaemonSet metadata: name: cadvisor namespace: kube-system labels: app: cadvisor spec: selector: matchLabels: name: cadvisor template: metadata: labels: name: cadvisor spec: hostNetwork: true tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: cadvisor image: heyuze/cadvisor:v0.43.0 imagePullPolicy: IfNotPresent volumeMounts: - name: rootfs mountPath: /rootfs readOnly: true - name: var-run mountPath: /var/run - name: sys mountPath: /sys readOnly: true - name: docker mountPath: /var/lib/docker readOnly: true ports: - name: http containerPort: 4194 protocol: TCP readinessProbe: tcpSocket: port: 4194 initialDelaySeconds: 5 periodSeconds: 10 args: - --housekeeping_interval=10s - --port=4194 terminationGracePeriodSeconds: 30 volumes: - name: rootfs hostPath: path: / - name: var-run hostPath: path: /var/run - name: sys hostPath: path: /sys - name: docker hostPath: path: /data/docker
1.3.3. 修改运算节点软连接 所有运算节点上:
1 2 3 4 5 6 7 8 [root@k8s-node1 ~]# mount -o remount,rw /sys/fs/cgroup/ [root@k8s-node1 ~]# ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu [root@k8s-master1 ~]# ll /sys/fs/cgroup/ | grep cpu lrwxrwxrwx 1 root root 11 Feb 26 13:36 cpu -> cpu,cpuacct lrwxrwxrwx 1 root root 11 Feb 26 13:36 cpuacct -> cpu,cpuacct lrwxrwxrwx 1 root root 27 Mar 11 12:54 cpuacct,cpu -> /sys/fs/cgroup/cpu,cpuacct/ drwxr-xr-x 5 root root 0 Feb 26 13:36 cpu,cpuacct drwxr-xr-x 3 root root 0 Feb 26 13:36 cpuset
1.3.4. 应用资源配置清单 任意运算节点上:
1 2 [root@k8s-master ~]# kubectl apply -f http://www.kubelet.cn/k8s-yaml/cadvisor/daemonset.yaml daemonset.apps/cadvisor created
查看运行端口(node节点)
1 2 [root@k8s-node1 ~]# netstat -luntp|grep 4194 tcp6 0 0 :::4194 :::* LISTEN 1634868/cadvisor
1.4. 部署blackbox-exporter 1.4.1. 准备blackbox-exporter镜像 blackbox-exporter官方dockerhub地址 blackbox-exporter官方github地址
拉取镜像
1 2 3 4 5 6 7 8 9 [root@harbor ~]# docker pull prom/blackbox-exporter:v0.19.0 v0.19.0: Pulling from prom/blackbox-exporter aa2a8d90b84c: Pull complete b45d31ee2d7f: Pull complete 1603b92f0389: Pull complete a8140d619b2f: Pull complete Digest: sha256:94de5897eef1b3c1ba7fbfebb9af366e032c0ff915a52c0066ff2e0c1bcd2e45 Status: Downloaded newer image for prom/blackbox-exporter:v0.19.0 docker.io/prom/blackbox-exporter:v0.19.0
查看镜像
1 2 3 [root@harbor ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE prom/blackbox-exporter v0.19.0 c9e462ce1ee4 7 months ago 20.9MB
打tag
1 [root@harbor ~]# docker tag c9e462ce1ee4 heyuze/blackbox-exporter:v0.19.0
推送
1 2 3 4 5 6 7 [root@harbor ~]# docker push heyuze/blackbox-exporter:v0.19.0 The push refers to repository [harbor.gong-hui.com/gonghui/blackbox-exporter] 256c4aa8ebe5: Pushed 4b6cc55de649: Pushed 986894c42222: Pushed adab5d09ba79: Pushed v0.15.1: digest: sha256:c20445e0cc628fa4b227fe2f694c22a314beb43fd8297095b6ee6cbc67161336 size: 1155
1.4.2. 准备资源配置清单 ConfigMap
/data/k8s-yaml/blackbox-exporter/configmap.yaml
[root@harbor ~]#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 apiVersion: v1 kind: ConfigMap metadata: labels: app: blackbox-exporter name: blackbox-exporter namespace: kube-system data: blackbox.yml: |- modules: http_2xx: prober: http timeout: 2s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] valid_status_codes: [200,301,302] method: GET preferred_ip_protocol: "ip4" tcp_connect: prober: tcp timeout: 2s
Deployment
/data/k8s-yaml/blackbox-exporter/deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 kind: Deployment apiVersion: apps/v1 metadata: name: blackbox-exporter namespace: kube-system labels: app: blackbox-exporter # annotations: # deployment.kubernetes.io/revision: 1 spec: replicas: 1 selector: matchLabels: app: blackbox-exporter template: metadata: labels: app: blackbox-exporter spec: volumes: - name: config configMap: name: blackbox-exporter defaultMode: 420 containers: - name: blackbox-exporter image: heyuze/blackbox-exporter:v0.19.0 imagePullPolicy: IfNotPresent args: - --config.file=/etc/blackbox_exporter/blackbox.yml - --log.level=info - --web.listen-address=:9115 ports: - name: blackbox-port containerPort: 9115 protocol: TCP resources: limits: cpu: 200m memory: 256Mi requests: cpu: 100m memory: 50Mi volumeMounts: - name: config mountPath: /etc/blackbox_exporter readinessProbe: tcpSocket: port: 9115 initialDelaySeconds: 5 timeoutSeconds: 5 periodSeconds: 10 successThreshold: 1 failureThreshold: 3
Service
/data/k8s-yaml/blackbox-exporter/service.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 kind: Service apiVersion: v1 metadata: name: blackbox-exporter namespace: kube-system spec: selector: app: blackbox-exporter ports: - name: blackbox-port protocol: TCP port: 9115 targetPort: 9115 type: LoadBalancer type: LoadBalancer
Ingress
/data/k8s-yaml/blackbox-exporter/ingress.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 apiVersion: extensions/v1beta1 kind: Ingress metadata: name: blackbox-exporter namespace: kube-system spec: rules: - host: blackbox.kubelet.cn http: paths: - path: / backend: serviceName: blackbox-exporter servicePort: blackbox-port apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: blackbox-exporter annotations: nginx.ingress.kubernetes.io/ingress.class: 'nginx' namespace: kube-system spec: ingressClassName: nginx rules: - host: blackbox.kubelet.cn http: paths: - path: / pathType: Prefix backend: service: name: blackbox-exporter port: number: blackbox-port
1.4.3. 应用资源配置清单 1 2 3 4 5 6 7 8 [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/configmap.yaml configmap/blackbox-exporter created [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/deployment.yaml deployment.apps/blackbox-exporter created [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/service.yaml service/blackbox-exporter created [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/ingress.yaml service/blackbox-exporter created
1.5. 部署prometheus 运维主机
1.5.1. 准备prometheus镜像 prometheus官方dockerhub地址 prometheus官方github地址
拉取镜像
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [root@harbor ~]# docker pull prom/prometheus:v2.32.0 v2.32.0: Pulling from prom/prometheus 3cb635b06aa2: Pull complete c4d1a94ab1db: Pull complete 41c8679f1eb7: Pull complete 1650e28e81f3: Pull complete a4af63abea67: Pull complete 101065466520: Pull complete e7d092467524: Pull complete 920f29a8238e: Pull complete d22cebb42c02: Pull complete 102a95cf6327: Pull complete c14687945637: Pull complete d2136b8fa9a3: Pull complete Digest: sha256:68aa603f9d797a8423e766b625cab4202bda7d9be8fc44d4e904dcea7f142177 Status: Downloaded newer image for prom/prometheus:v2.32.0 docker.io/prom/prometheus:v2.32.0
查看镜像
1 2 3 [root@harbor ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE prom/prometheus v2.32.0 9e4125f21d5f 12 days ago 201MB
打tag
1 [root@harbor ~]# docker tag 9e4125f21d5f heyuze/prometheus:v2.32.0
推送
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [root@harbor ~]# docker push heyuze/prometheus:v2.32.0 The push refers to repository [docker.io/heyuze/prometheus] 2e73ed7d5d4d: Mounted from prom/prometheus 10e7fc7c8b3d: Mounted from prom/prometheus 051ea654c6e9: Mounted from prom/prometheus 913f2f736476: Mounted from prom/prometheus f2083decbd50: Mounted from prom/prometheus 8bbbfc276b7c: Mounted from prom/prometheus d90183f5bbf3: Mounted from prom/prometheus 5df4d348c75e: Mounted from prom/prometheus 3120b8f3e6b5: Mounted from prom/prometheus 9cbe643a4493: Mounted from prom/prometheus 29908fb03ed8: Mounted from prom/prometheus 64cac9eaf0da: Mounted from prom/prometheus v2.32.0: digest: sha256:a8f33123429b8df0d01af19f639c4427a434e3143e0c4df84f688886960f53c4 size: 2823
1.5.2. 准备资源配置清单 运维主机
/data/k8s-yaml
1 2 [root@harbor ~]# mkdir /data/k8s-yaml/prometheus && mkdir -p /data/nfs-volume/prometheus/etc && cd /data/k8s-yaml/prometheus [root@harbor prometheus]#
RBAC
vim /data/k8s-yaml/prometheus/rbac.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 apiVersion: v1 kind: ServiceAccount metadata: labels: addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/cluster-service: "true" name: prometheus namespace: infra --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/cluster-service: "true" name: prometheus rules: - apiGroups: - "" resources: - nodes - nodes/metrics - services - endpoints - pods verbs: - get - list - watch - apiGroups: - "" resources: - configmaps verbs: - get - nonResourceURLs: - /metrics verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/cluster-service: "true" name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: infra
Deployment
vi /data/k8s-yaml/prometheus/deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "5" labels: name: prometheus name: prometheus namespace: infra spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 7 selector: matchLabels: app: prometheus strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: labels: app: prometheus spec: containers: - name: prometheus image: heyuze/prometheus:v2.32.0 imagePullPolicy: IfNotPresent command: - /bin/prometheus args: - --config.file=/data/etc/prometheus.yml - --storage.tsdb.path=/data/prom-db - --storage.tsdb.min-block-duration=10m - --storage.tsdb.retention=72h ports: - containerPort: 9090 protocol: TCP volumeMounts: - mountPath: /data name: data resources: requests: cpu: "1000m" memory: "1.5Gi" limits: cpu: "2000m" memory: "3Gi" imagePullSecrets: - name: harbor securityContext: runAsUser: 0 serviceAccountName: prometheus volumes: - name: data nfs: server: 192.168.101.198 path: /data/nfs-volume/prometheus
Service
vim /data/k8s-yaml/prometheus/service.yaml
1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: v1 kind: Service metadata: name: prometheus namespace: infra spec: ports: - port: 9090 protocol: TCP targetPort: 9090 selector: app: prometheus
Ingress
vim /data/k8s-yaml/prometheus/ingress.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 apiVersion: extensions/v1beta1 kind: Ingress metadata: annotations: kubernetes.io/ingress.class: nginx name: prometheus namespace: infra spec: rules: - host: prometheus.kubelet.cn http: paths: - path: / backend: serviceName: prometheus servicePort: 9090
1.5.3. 准备prometheus的配置文件 拷贝证书文件
1 2 3 4 5 6 7 8 [root@k8s-master1 k8s]# pwd /root/TLS/k8s [root@k8s-master1 k8s]# scp ca.pem server.pem server-key.pem root@192.168.3.187:/data/nfs-volume/prometheus/etc root@192.168.3.187's password: ca.pem 100% 1359 1.2MB/s 00:00 server.pem 100% 1684 1.5MB/s 00:00 server-key.pem 100% 1679 1.6MB/s 00:00 [root@k8s-master1 k8s]#
查看证书
1 2 3 4 5 6 [root@harbor etc]# ll total 20 -rw-r--r-- 1 root root 1359 Mar 12 21:02 ca.pem -rw-r--r-- 1 root root 5437 Mar 12 20:27 prometheus.yml -rw------- 1 root root 1679 Mar 12 21:02 server-key.pem -rw-r--r-- 1 root root 1684 Mar 12 21:02 server.pem
运算节点
/data/nfs-volume/prometheus/etc/prometheus.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'etcd' tls_config: ca_file: /data/etc/ca.pem cert_file: /data/etc/server.pem key_file: /data/etc/server-key.pem scheme: https static_configs: - targets: - '192.168.3.183:2379' - '192.168.3.184:2379' - '192.168.3.185:2379' - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: 'kubernetes-kubelet' kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __address__ replacement: ${1}:10255 - job_name: 'kubernetes-cadvisor' kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __address__ replacement: ${1}:4194 - job_name: 'kubernetes-kube-state' kubernetes_sd_configs: - role: pod relabel_configs: - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - source_labels: [__meta_kubernetes_pod_label_grafanak8sapp] regex: .*true.* action: keep - source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name'] regex: 'node-exporter;(.*)' action: replace target_label: nodename - job_name: 'blackbox_http_pod_probe' metrics_path: /probe kubernetes_sd_configs: - role: pod params: module: [http_2xx] relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme] action: keep regex: http - source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port, __meta_kubernetes_pod_annotation_blackbox_path] action: replace regex: ([^:]+)(?::\d+)?;(\d+);(.+) replacement: $1:$2$3 target_label: __param_target - action: replace target_label: __address__ replacement: blackbox-exporter.kube-system:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: 'blackbox_tcp_pod_probe' metrics_path: /probe kubernetes_sd_configs: - role: pod params: module: [tcp_connect] relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme] action: keep regex: tcp - source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __param_target - action: replace target_label: __address__ replacement: blackbox-exporter.kube-system:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: 'traefik' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] action: keep regex: traefik - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name
1.5.4. 应用资源配置清单 1 2 3 4 5 6 7 kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml
1.5.5. 访问 解析域名到vip
prometheus.gong-hui.com