1. 使用Prometheus和Grafana监控kubernetes集群 
1 2 3 4 5 6 7 8 9 10 11 通过prometheus-node-exporter采集主机的性能指标数据,并通过暴露的 /metrics 接口用prometheus抓取   通过kube-apiserver、kube-controller-manager、kube-scheduler、etcd、kubelet、kube-proxy自身暴露的 /metrics 获取节点上与k8s集群相关的一些指标数据   通过cadvisor采集容器、Pod相关的性能指标数据,并通过暴露的 /metrics 接口用prometheus抓取   通过blackbox-exporter采集应用的网络性能(http、tcp、icmp等)数据,并通过暴露的 /metrics 接口用prometheus抓取   通过kube-state-metrics采集k8s资源对象的状态指标数据,并通过暴露的 /metrics 接口用prometheus抓取   应用自己采集容器中进程主动暴露的指标数据(暴露指标的功能由应用自己实现,并添加约定的annotation,prometheus负责根据annotation实现抓取) 
 
1.1. 部署kube-state-metrics kube-state-metrics (KSM)是一个简单的服务,它侦听Kubernetes API服务器并生成关于对象状态的度量。(参见下面度量部分中的例子。)它不关注单个Kubernetes组件的运行状况,而是关注内部各种对象(如部署、节点和pod)的运行状况。
1.1.1. 下载源码包 对应版本选择,我这里k8s版本是1.22,因此选择v2.3.0
kube-state-metrics 
Kubernetes 1.19  
Kubernetes 1.20  
Kubernetes 1.21  
Kubernetes 1.22  
Kubernetes 1.23  
 
 
v1.9.8  
- 
- 
- 
- 
- 
 
v2.1.1  
✓ 
✓ 
✓ 
-/✓ 
-/✓ 
 
v2.2.4  
✓ 
✓ 
✓ 
✓ 
✓ 
 
v2.3.0  
✓ 
✓ 
✓ 
✓ 
✓ 
 
master  
✓ 
✓ 
✓ 
✓ 
✓ 
 
✓ 完全支持的版本范围。 
- Kubernetes集群有一些客户端库不能使用的特性(额外的API对象,废弃的API,等等)。 
 
1.1.1.1. 下载解压 1 2 3 wget https://github.com/kubernetes/kube-state-metrics/archive/refs/tags/v2.3.0.zip unzip v2.3.0.zip cd kube-state-metrics-2.3.0/examples/standard 
 
1.1.1.2. 查看yaml文件 1 2 3 4 5 6 7 [root@harbor k8s-yaml]# ll total 20 -rw-r--r-- 1 root root  418 Dec  9 15:24 cluster-role-binding.yaml -rw-r--r-- 1 root root 1665 Dec  9 15:24 cluster-role.yaml -rw-r--r-- 1 root root 1222 Dec  9 15:24 deployment.yaml -rw-r--r-- 1 root root  234 Dec  9 15:24 service-account.yaml -rw-r--r-- 1 root root  447 Dec  9 15:24 service.yaml 
 
1.1.1.3. 准备镜像 因为镜像需要科学上网才可以下载,这里我做好镜像上传到dockerhub,可直接替换
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 # 拉取镜像 [root@app1 ~]# docker pull k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 v2.3.0: Pulling from kube-state-metrics/kube-state-metrics e8614d09b7be: Pull complete  53ccb90bafd7: Pull complete  Digest: sha256:c9137505edaef138cc23479c73e46e9a3ef7ec6225b64789a03609c973b99030 Status: Downloaded newer image for k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 # 查看 [root@app1 ~]# docker images|grep kube-state-metrics k8s.gcr.io/kube-state-metrics/kube-state-metrics   v2.3.0              df2bb3f0d0cd        2 weeks ago         38.7MB # 打tag [root@app1 ~]# docker tag df2bb3f0d0cd heyuze/kube-state-metrics:v2.3.0 # 上传dockerhub [root@app1 ~]# docker push heyuze/kube-state-metrics:v2.3.0 The push refers to repository [docker.io/heyuze/kube-state-metrics] cb4962d0d70b: Pushed  6d75f23be3dd: Pushed  v2.3.0: digest: sha256:d964b5107fb31e9020db0d3e738ba4e1fc83a242638ee7e0ae78939baaedbe59 size: 739 
 
在deployment.yaml中将镜像替换为==heyuze/kube-state-metrics:v2.3.0==即可。
1.1.2. 资源配置清单 deployment 
vim deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 apiVersion: apps/v1 kind: Deployment metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics   namespace: kube-system spec:   replicas: 1   selector:     matchLabels:       app.kubernetes.io/name: kube-state-metrics   template:     metadata:       labels:         app.kubernetes.io/component: exporter         app.kubernetes.io/name: kube-state-metrics         app.kubernetes.io/version: 2.3.0     spec:       containers:       - image: heyuze/kube-state-metrics:v2.3.0         livenessProbe:           httpGet:             path: /healthz             port: 8080           initialDelaySeconds: 5           timeoutSeconds: 5         name: kube-state-metrics         ports:         - containerPort: 8080           name: http-metrics         - containerPort: 8081           name: telemetry         readinessProbe:           httpGet:             path: /             port: 8081           initialDelaySeconds: 5           timeoutSeconds: 5         securityContext:           runAsUser: 65534       nodeSelector:         kubernetes.io/os: linux       serviceAccountName: kube-state-metrics 
 
ClusterRoleBinding 
vim cluster-role-binding.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics roleRef:   apiGroup: rbac.authorization.k8s.io   kind: ClusterRole   name: kube-state-metrics subjects: - kind: ServiceAccount   name: kube-state-metrics   namespace: kube-system 
 
ClusterRole 
vim cluster-role.yaml 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics rules: - apiGroups:   - ""   resources:   - configmaps   - secrets   - nodes   - pods   - services   - resourcequotas   - replicationcontrollers   - limitranges   - persistentvolumeclaims   - persistentvolumes   - namespaces   - endpoints   verbs:   - list   - watch - apiGroups:   - apps   resources:   - statefulsets   - daemonsets   - deployments   - replicasets   verbs:   - list   - watch - apiGroups:   - batch   resources:   - cronjobs   - jobs   verbs:   - list   - watch - apiGroups:   - autoscaling   resources:   - horizontalpodautoscalers   verbs:   - list   - watch - apiGroups:   - authentication.k8s.io   resources:   - tokenreviews   verbs:   - create - apiGroups:   - authorization.k8s.io   resources:   - subjectaccessreviews   verbs:   - create - apiGroups:   - policy   resources:   - poddisruptionbudgets   verbs:   - list   - watch - apiGroups:   - certificates.k8s.io   resources:   - certificatesigningrequests   verbs:   - list   - watch - apiGroups:   - storage.k8s.io   resources:   - storageclasses   - volumeattachments   verbs:   - list   - watch - apiGroups:   - admissionregistration.k8s.io   resources:   - mutatingwebhookconfigurations   - validatingwebhookconfigurations   verbs:   - list   - watch - apiGroups:   - networking.k8s.io   resources:   - networkpolicies   - ingresses   verbs:   - list   - watch - apiGroups:   - coordination.k8s.io   resources:   - leases   verbs:   - list   - watch 
 
service-account 
vim service-account.yaml 
1 2 3 4 5 6 7 8 9 apiVersion: v1 kind: ServiceAccount metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics   namespace: kube-system 
 
Service 
vim service.yaml 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 apiVersion: v1 kind: Service metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics   namespace: kube-system spec:   clusterIP: None   ports:   - name: http-metrics     port: 8080     targetPort: http-metrics   - name: telemetry     port: 8081     targetPort: telemetry   selector:     app.kubernetes.io/name: kube-state-metrics 
 
1.1.3. 应用资源配置清单 master机器
1 2 3 4 5 6 [root@k8s-master ~]# kubectl apply -f ./ clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created serviceaccount/kube-state-metrics created service/kube-state-metrics created 
 
检查启动情况
1 2 3 [root@k8s-master ~]# kubectl get pods,svc -n kube-system|grep kube-state-metrics kube-state-metrics-7f8f6fc7fd-qxw8z   1/1     Running   0              87s service/kube-state-metrics   ClusterIP   None         <none>        8080/TCP,8081/TCP        87s 
 
检查是否正常
1 2 [root@k8s-master ~]# curl localhost:8080/healthz ok 
 
1.2. 部署node-exporter 1.2.1. 准备node-exporter镜像 node-exporter官方dockerhub地址 node-expoerer官方github地址 
拉取镜像
1 2 3 4 5 6 7 8 [root@harbor ~]# docker pull prom/node-exporter:v1.3.1 v1.3.1: Pulling from prom/node-exporter aa2a8d90b84c: Pull complete  b45d31ee2d7f: Pull complete  b5db1e299295: Pull complete  Digest: sha256:f2269e73124dd0f60a7d19a2ce1264d33d08a985aed0ee6b0b89d0be470592cd Status: Downloaded newer image for prom/node-exporter:v1.3.1 docker.io/prom/node-exporter:v1.3.1 
 
查看拉取的镜像
1 2 3 [root@harbor ~]# docker images REPOSITORY                                      TAG                 IMAGE ID            CREATED             SIZE prom/node-exporter                              v1.3.1              1dbe0e931976        2 weeks ago         20.9MB 
 
打tag
1 [root@harbor ~]# docker tag 1dbe0e931976 heyuze/node-exporter:v1.3.1 
 
查看打成功的tag
1 2 3 4 [root@harbor ~]# docker images REPOSITORY                                       TAG                 IMAGE ID            CREATED             SIZE heyuze/node-exporter                             v1.3.1              1dbe0e931976        2 weeks ago         20.9MB prom/node-exporter                               v1.3.1              1dbe0e931976        2 weeks ago         20.9MB 
 
推送到镜像仓库
1 2 3 4 5 6 [root@harbor ~]# docker push heyuze/node-exporter:v1.3.1 The push refers to repository [docker.io/heyuze/node-exporter] 5f6d9bc8e23d: Mounted from prom/node-exporter  8d42cad20cac: Mounted from prom/node-exporter  36b45d63da70: Mounted from prom/node-exporter  v1.3.1: digest: sha256:d5b2a2e2bb07a4a5a7c4bd9e54641cab63e1d2627622dbde17efc04849d3d30d size: 948 
 
1.2.2. 准备资源配置清单 vim /data/k8s-yaml/node-exporter/node-exporter-ds.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 kind: DaemonSet apiVersion: apps/v1 metadata:   name: node-exporter   namespace: kube-system   labels:     daemon: "node-exporter"     grafanak8sapp: "true" spec:   selector:     matchLabels:       daemon: "node-exporter"       grafanak8sapp: "true"   template:     metadata:       name: node-exporter       labels:         daemon: "node-exporter"         grafanak8sapp: "true"     spec:       volumes:       - name: proc         hostPath:            path: /proc           type: ""       - name: sys         hostPath:           path: /sys           type: ""       imagePullSecrets:       - name: registry-pull-secret       containers:       - name: node-exporter         image: heyuze/node-exporter:v1.3.1         imagePullPolicy: IfNotPresent         args:         - --path.procfs=/host_proc         - --path.sysfs=/host_sys         ports:         - name: node-exporter           hostPort: 9100           containerPort: 9100           protocol: TCP         volumeMounts:         - name: sys           readOnly: true           mountPath: /host_sys         - name: proc           readOnly: true           mountPath: /host_proc       hostNetwork: true 
 
1.2.3. 应用资源配置清单 1 2 [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/node-exporter/node-exporter-ds.yaml daemonset.apps/node-exporter created 
 
检查启动情况
1 2 3 [root@k8s-master1 ~]# kubectl get pod -n kube-system|grep node-exporter node-exporter-rh7fx                   1/1     Running   0               40s node-exporter-vgnzt                   1/1     Running   0               40s 
 
健康监控状况
1 [root@k8s-node1 ~]#  curl localhost:9100/metrics 
 
只要可以获取到节点数据就表示正常
1.3. 部署cadvisor  cAdvisor对Node机器上的资源及容器进行实时监控和性能数据采集,包括CPU使用情况、内存使用情况、网络吞吐量及文件系统使用情况,cAdvisor集成在Kubelet中,当kubelet启动时会自动启动cAdvisor,即一个cAdvisor仅对一台Node机器进行监控。kubelet的启动参数–cadvisor-port可以定义cAdvisor对外提供服务的端口,默认为4194。可以通过浏览器访问。
1.3.1. 准备cadvisor镜像 cadvisor官方dockerhub地址 cadvisor官方github地址 
cadvisor官方gcr地址 
由于google已经不在dockerhub更新cadvisor镜像,最新的镜像都更新到gcr.io/cadvisor/cadvisor,我这里下载后上传到dockerhub,修改镜像地址即可。
拉取镜像
1 2 3 4 5 6 7 8 9 10 [root@harbor harbor]# docker pull gcr.io/cadvisor/cadvisor:v0.43.0 v0.43.0: Pulling from cadvisor/cadvisor e519532ddf75: Pull complete  2e08db3b6bd0: Pull complete  83f705f3387b: Pull complete  7f10f7c55689: Pull complete  3fdbcd5b103f: Pull complete  Digest: sha256:89e6137f068ded2e9a3a012ce71260b9afc57a19305842aa1074239841a539a7 Status: Downloaded newer image for gcr.io/cadvisor/cadvisor:v0.43.0 gcr.io/cadvisor/cadvisor:v0.43.0 
 
查看镜像
1 2 3 [root@harbor harbor]# docker images REPOSITORY                                       TAG                 IMAGE ID            CREATED             SIZE gcr.io/cadvisor/cadvisor                         v0.43.0             80f16aa8c3c8        6 weeks ago         87.5MB 
 
打tag
1 [root@harbor harbor]# docker tag 80f16aa8c3c8 heyuze/cadvisor:v0.43.0 
 
推送
1 2 3 4 5 6 7 8 [root@harbor harbor]# docker push heyuze/cadvisor:v0.43.0 The push refers to repository [docker.io/heyuze/cadvisor] f2485927f8bd: Pushed  571a7fddbc78: Pushed  f1e964b32d2a: Pushed  41768b6793f5: Pushed  e6688e911f15: Pushed  v0.43.0: digest: sha256:89e6137f068ded2e9a3a012ce71260b9afc57a19305842aa1074239841a539a7 size: 1373 
 
1.3.2. 准备资源配置清单 vi /data/k8s-yaml/cadvisor/daemonset.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 apiVersion: apps/v1 kind: DaemonSet metadata:   name: cadvisor   namespace: kube-system   labels:     app: cadvisor spec:   selector:     matchLabels:       name: cadvisor   template:     metadata:       labels:         name: cadvisor     spec:       hostNetwork: true       tolerations:       - key: node-role.kubernetes.io/master         effect: NoSchedule       containers:       - name: cadvisor         image: heyuze/cadvisor:v0.43.0         imagePullPolicy: IfNotPresent         volumeMounts:         - name: rootfs           mountPath: /rootfs           readOnly: true         - name: var-run           mountPath: /var/run         - name: sys           mountPath: /sys           readOnly: true         - name: docker           mountPath: /var/lib/docker           readOnly: true         ports:           - name: http             containerPort: 4194             protocol: TCP         readinessProbe:           tcpSocket:             port: 4194           initialDelaySeconds: 5           periodSeconds: 10         args:           - --housekeeping_interval=10s           - --port=4194       terminationGracePeriodSeconds: 30       volumes:       - name: rootfs         hostPath:           path: /       - name: var-run         hostPath:           path: /var/run       - name: sys         hostPath:           path: /sys       - name: docker         hostPath:           path: /data/docker 
 
1.3.3. 修改运算节点软连接 所有运算节点上:
1 2 3 4 5 6 7 8 [root@k8s-node1 ~]# mount -o remount,rw /sys/fs/cgroup/ [root@k8s-node1 ~]# ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu [root@k8s-master1 ~]# ll /sys/fs/cgroup/ | grep cpu lrwxrwxrwx 1 root root 11 Feb 26 13:36 cpu -> cpu,cpuacct lrwxrwxrwx 1 root root 11 Feb 26 13:36 cpuacct -> cpu,cpuacct lrwxrwxrwx 1 root root 27 Mar 11 12:54 cpuacct,cpu -> /sys/fs/cgroup/cpu,cpuacct/ drwxr-xr-x 5 root root  0 Feb 26 13:36 cpu,cpuacct drwxr-xr-x 3 root root  0 Feb 26 13:36 cpuset 
 
1.3.4. 应用资源配置清单 任意运算节点上:
1 2 [root@k8s-master ~]# kubectl apply -f http://www.kubelet.cn/k8s-yaml/cadvisor/daemonset.yaml daemonset.apps/cadvisor created 
 
查看运行端口(node节点)
1 2 [root@k8s-node1 ~]# netstat -luntp|grep 4194 tcp6       0      0 :::4194                 :::*                    LISTEN      1634868/cadvisor     
 
1.4. 部署blackbox-exporter 1.4.1. 准备blackbox-exporter镜像 blackbox-exporter官方dockerhub地址 blackbox-exporter官方github地址 
拉取镜像
1 2 3 4 5 6 7 8 9 [root@harbor ~]# docker pull prom/blackbox-exporter:v0.19.0 v0.19.0: Pulling from prom/blackbox-exporter aa2a8d90b84c: Pull complete  b45d31ee2d7f: Pull complete  1603b92f0389: Pull complete  a8140d619b2f: Pull complete  Digest: sha256:94de5897eef1b3c1ba7fbfebb9af366e032c0ff915a52c0066ff2e0c1bcd2e45 Status: Downloaded newer image for prom/blackbox-exporter:v0.19.0 docker.io/prom/blackbox-exporter:v0.19.0 
 
查看镜像
1 2 3 [root@harbor ~]# docker images REPOSITORY                                    TAG                 IMAGE ID            CREATED         SIZE prom/blackbox-exporter                        v0.19.0             c9e462ce1ee4        7 months ago    20.9MB 
 
打tag
1 [root@harbor ~]# docker tag c9e462ce1ee4 heyuze/blackbox-exporter:v0.19.0 
 
推送
1 2 3 4 5 6 7 [root@harbor ~]# docker push heyuze/blackbox-exporter:v0.19.0 The push refers to repository [harbor.gong-hui.com/gonghui/blackbox-exporter] 256c4aa8ebe5: Pushed  4b6cc55de649: Pushed  986894c42222: Pushed  adab5d09ba79: Pushed  v0.15.1: digest: sha256:c20445e0cc628fa4b227fe2f694c22a314beb43fd8297095b6ee6cbc67161336 size: 1155 
 
1.4.2. 准备资源配置清单 ConfigMap 
/data/k8s-yaml/blackbox-exporter/configmap.yaml
[root@harbor ~]# 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 apiVersion: v1 kind: ConfigMap metadata:   labels:     app: blackbox-exporter   name: blackbox-exporter   namespace: kube-system data:   blackbox.yml: |-     modules:       http_2xx:         prober: http         timeout: 2s         http:           valid_http_versions: ["HTTP/1.1", "HTTP/2"]           valid_status_codes: [200,301,302]           method: GET           preferred_ip_protocol: "ip4"       tcp_connect:         prober: tcp         timeout: 2s 
 
Deployment 
/data/k8s-yaml/blackbox-exporter/deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 kind: Deployment apiVersion: apps/v1 metadata:   name: blackbox-exporter   namespace: kube-system   labels:     app: blackbox-exporter #  annotations: #    deployment.kubernetes.io/revision: 1 spec:   replicas: 1   selector:     matchLabels:       app: blackbox-exporter   template:     metadata:       labels:         app: blackbox-exporter     spec:       volumes:       - name: config         configMap:           name: blackbox-exporter           defaultMode: 420       containers:       - name: blackbox-exporter         image: heyuze/blackbox-exporter:v0.19.0         imagePullPolicy: IfNotPresent         args:         - --config.file=/etc/blackbox_exporter/blackbox.yml         - --log.level=info         - --web.listen-address=:9115         ports:         - name: blackbox-port           containerPort: 9115           protocol: TCP         resources:           limits:             cpu: 200m             memory: 256Mi           requests:             cpu: 100m             memory: 50Mi         volumeMounts:         - name: config           mountPath: /etc/blackbox_exporter         readinessProbe:           tcpSocket:             port: 9115           initialDelaySeconds: 5           timeoutSeconds: 5           periodSeconds: 10           successThreshold: 1           failureThreshold: 3 
 
Service 
/data/k8s-yaml/blackbox-exporter/service.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 kind: Service apiVersion: v1 metadata:   name: blackbox-exporter   namespace: kube-system spec:   selector:     app: blackbox-exporter   ports:     - name: blackbox-port       protocol: TCP       port: 9115       targetPort: 9115       type: LoadBalancer   type: LoadBalancer 
 
Ingress 
/data/k8s-yaml/blackbox-exporter/ingress.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 apiVersion: extensions/v1beta1 kind: Ingress metadata:   name: blackbox-exporter   namespace: kube-system spec:   rules:   - host: blackbox.kubelet.cn     http:       paths:       - path: /         backend:           serviceName: blackbox-exporter           servicePort: blackbox-port apiVersion: networking.k8s.io/v1 kind: Ingress metadata:   name: blackbox-exporter   annotations:     nginx.ingress.kubernetes.io/ingress.class: 'nginx'   namespace: kube-system spec:   ingressClassName: nginx   rules:   - host: blackbox.kubelet.cn     http:       paths:       - path: /         pathType: Prefix         backend:           service:             name: blackbox-exporter             port:               number: blackbox-port 
 
1.4.3. 应用资源配置清单 1 2 3 4 5 6 7 8 [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/configmap.yaml configmap/blackbox-exporter created [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/deployment.yaml deployment.apps/blackbox-exporter created [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/service.yaml service/blackbox-exporter created [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/ingress.yaml service/blackbox-exporter created 
 
1.5. 部署prometheus 运维主机
1.5.1. 准备prometheus镜像 prometheus官方dockerhub地址 prometheus官方github地址 
拉取镜像
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [root@harbor ~]# docker pull prom/prometheus:v2.32.0 v2.32.0: Pulling from prom/prometheus 3cb635b06aa2: Pull complete  c4d1a94ab1db: Pull complete  41c8679f1eb7: Pull complete  1650e28e81f3: Pull complete  a4af63abea67: Pull complete  101065466520: Pull complete  e7d092467524: Pull complete  920f29a8238e: Pull complete  d22cebb42c02: Pull complete  102a95cf6327: Pull complete  c14687945637: Pull complete  d2136b8fa9a3: Pull complete  Digest: sha256:68aa603f9d797a8423e766b625cab4202bda7d9be8fc44d4e904dcea7f142177 Status: Downloaded newer image for prom/prometheus:v2.32.0 docker.io/prom/prometheus:v2.32.0 
 
查看镜像
1 2 3 [root@harbor ~]# docker images REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE prom/prometheus          v2.32.0             9e4125f21d5f        12 days ago         201MB 
 
打tag
1 [root@harbor ~]# docker tag 9e4125f21d5f heyuze/prometheus:v2.32.0 
 
推送
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [root@harbor ~]# docker push heyuze/prometheus:v2.32.0 The push refers to repository [docker.io/heyuze/prometheus] 2e73ed7d5d4d: Mounted from prom/prometheus  10e7fc7c8b3d: Mounted from prom/prometheus  051ea654c6e9: Mounted from prom/prometheus  913f2f736476: Mounted from prom/prometheus  f2083decbd50: Mounted from prom/prometheus  8bbbfc276b7c: Mounted from prom/prometheus  d90183f5bbf3: Mounted from prom/prometheus  5df4d348c75e: Mounted from prom/prometheus  3120b8f3e6b5: Mounted from prom/prometheus  9cbe643a4493: Mounted from prom/prometheus  29908fb03ed8: Mounted from prom/prometheus  64cac9eaf0da: Mounted from prom/prometheus  v2.32.0: digest: sha256:a8f33123429b8df0d01af19f639c4427a434e3143e0c4df84f688886960f53c4 size: 2823 
 
1.5.2. 准备资源配置清单 运维主机
/data/k8s-yaml
1 2 [root@harbor ~]# mkdir /data/k8s-yaml/prometheus && mkdir -p /data/nfs-volume/prometheus/etc && cd /data/k8s-yaml/prometheus [root@harbor prometheus]# 
 
RBAC 
vim /data/k8s-yaml/prometheus/rbac.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 apiVersion: v1 kind: ServiceAccount metadata:   labels:     addonmanager.kubernetes.io/mode: Reconcile     kubernetes.io/cluster-service: "true"   name: prometheus   namespace: infra --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:   labels:     addonmanager.kubernetes.io/mode: Reconcile     kubernetes.io/cluster-service: "true"   name: prometheus rules: - apiGroups:   - ""   resources:   - nodes   - nodes/metrics   - services   - endpoints   - pods   verbs:   - get   - list   - watch - apiGroups:   - ""   resources:   - configmaps   verbs:   - get - nonResourceURLs:   - /metrics   verbs:   - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:   labels:     addonmanager.kubernetes.io/mode: Reconcile     kubernetes.io/cluster-service: "true"   name: prometheus roleRef:   apiGroup: rbac.authorization.k8s.io   kind: ClusterRole   name: prometheus subjects: - kind: ServiceAccount   name: prometheus   namespace: infra 
 
Deployment
vi /data/k8s-yaml/prometheus/deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 apiVersion: apps/v1 kind: Deployment metadata:   annotations:     deployment.kubernetes.io/revision: "5"   labels:     name: prometheus   name: prometheus   namespace: infra spec:   progressDeadlineSeconds: 600   replicas: 1   revisionHistoryLimit: 7   selector:     matchLabels:       app: prometheus   strategy:     rollingUpdate:       maxSurge: 1       maxUnavailable: 1     type: RollingUpdate   template:     metadata:       labels:         app: prometheus     spec:       containers:       - name: prometheus         image: heyuze/prometheus:v2.32.0         imagePullPolicy: IfNotPresent         command:         - /bin/prometheus         args:         - --config.file=/data/etc/prometheus.yml         - --storage.tsdb.path=/data/prom-db         - --storage.tsdb.min-block-duration=10m         - --storage.tsdb.retention=72h         ports:         - containerPort: 9090           protocol: TCP         volumeMounts:         - mountPath: /data           name: data         resources:           requests:             cpu: "1000m"             memory: "1.5Gi"           limits:             cpu: "2000m"             memory: "3Gi"       imagePullSecrets:       - name: harbor       securityContext:         runAsUser: 0       serviceAccountName: prometheus       volumes:       - name: data         nfs:           server: 192.168.101.198           path: /data/nfs-volume/prometheus 
 
Service 
vim /data/k8s-yaml/prometheus/service.yaml
1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: v1 kind: Service metadata:   name: prometheus   namespace: infra spec:   ports:   - port: 9090     protocol: TCP     targetPort: 9090   selector:     app: prometheus 
 
Ingress 
vim /data/k8s-yaml/prometheus/ingress.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 apiVersion: extensions/v1beta1 kind: Ingress metadata:   annotations:     kubernetes.io/ingress.class: nginx   name: prometheus   namespace: infra spec:   rules:   - host: prometheus.kubelet.cn     http:       paths:       - path: /         backend:           serviceName: prometheus           servicePort: 9090 
 
1.5.3. 准备prometheus的配置文件 拷贝证书文件
1 2 3 4 5 6 7 8 [root@k8s-master1 k8s]# pwd /root/TLS/k8s [root@k8s-master1 k8s]# scp ca.pem server.pem server-key.pem root@192.168.3.187:/data/nfs-volume/prometheus/etc root@192.168.3.187's password:  ca.pem                                                                                       100% 1359     1.2MB/s   00:00     server.pem                                                                                   100% 1684     1.5MB/s   00:00     server-key.pem                                                                               100% 1679     1.6MB/s   00:00     [root@k8s-master1 k8s]#  
 
查看证书
1 2 3 4 5 6 [root@harbor etc]# ll total 20 -rw-r--r-- 1 root root 1359 Mar 12 21:02 ca.pem -rw-r--r-- 1 root root 5437 Mar 12 20:27 prometheus.yml -rw------- 1 root root 1679 Mar 12 21:02 server-key.pem -rw-r--r-- 1 root root 1684 Mar 12 21:02 server.pem 
 
运算节点
/data/nfs-volume/prometheus/etc/prometheus.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 global:   scrape_interval:     15s   evaluation_interval: 15s scrape_configs: - job_name: 'etcd'   tls_config:     ca_file: /data/etc/ca.pem     cert_file: /data/etc/server.pem     key_file: /data/etc/server-key.pem   scheme: https   static_configs:   - targets:     - '192.168.3.183:2379'     - '192.168.3.184:2379'     - '192.168.3.185:2379' - job_name: 'kubernetes-apiservers'   kubernetes_sd_configs:   - role: endpoints   scheme: https   tls_config:     ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt   bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token   relabel_configs:   - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]     action: keep     regex: default;kubernetes;https - job_name: 'kubernetes-pods'   kubernetes_sd_configs:   - role: pod   relabel_configs:   - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]     action: keep     regex: true   - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]     action: replace     target_label: __metrics_path__     regex: (.+)   - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]     action: replace     regex: ([^:]+)(?::\d+)?;(\d+)     replacement: $1:$2     target_label: __address__   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name - job_name: 'kubernetes-kubelet'   kubernetes_sd_configs:   - role: node   relabel_configs:   - action: labelmap     regex: __meta_kubernetes_node_label_(.+)   - source_labels: [__meta_kubernetes_node_name]     regex: (.+)     target_label: __address__     replacement: ${1}:10255 - job_name: 'kubernetes-cadvisor'   kubernetes_sd_configs:   - role: node   relabel_configs:   - action: labelmap     regex: __meta_kubernetes_node_label_(.+)   - source_labels: [__meta_kubernetes_node_name]     regex: (.+)     target_label: __address__     replacement: ${1}:4194 - job_name: 'kubernetes-kube-state'   kubernetes_sd_configs:   - role: pod   relabel_configs:   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name   - source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]     regex: .*true.*     action: keep   - source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']     regex: 'node-exporter;(.*)'     action: replace     target_label: nodename - job_name: 'blackbox_http_pod_probe'   metrics_path: /probe   kubernetes_sd_configs:   - role: pod   params:     module: [http_2xx]   relabel_configs:   - source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]     action: keep     regex: http   - source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port,  __meta_kubernetes_pod_annotation_blackbox_path]     action: replace     regex: ([^:]+)(?::\d+)?;(\d+);(.+)     replacement: $1:$2$3     target_label: __param_target   - action: replace     target_label: __address__     replacement: blackbox-exporter.kube-system:9115   - source_labels: [__param_target]     target_label: instance   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name - job_name: 'blackbox_tcp_pod_probe'   metrics_path: /probe   kubernetes_sd_configs:   - role: pod   params:     module: [tcp_connect]   relabel_configs:   - source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]     action: keep     regex: tcp   - source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port]     action: replace     regex: ([^:]+)(?::\d+)?;(\d+)     replacement: $1:$2     target_label: __param_target   - action: replace     target_label: __address__     replacement: blackbox-exporter.kube-system:9115   - source_labels: [__param_target]     target_label: instance   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name - job_name: 'traefik'   kubernetes_sd_configs:   - role: pod   relabel_configs:   - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]     action: keep     regex: traefik   - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]     action: replace     target_label: __metrics_path__     regex: (.+)   - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]     action: replace     regex: ([^:]+)(?::\d+)?;(\d+)     replacement: $1:$2     target_label: __address__   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name 
 
1.5.4. 应用资源配置清单 1 2 3 4 5 6 7 kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml 
 
1.5.5. 访问 解析域名到vip
prometheus.gong-hui.com