1. 使用Prometheus和Grafana监控kubernetes集群 
1 2 3 4 5 6 7 8 9 10 11 通过prometheus-node-exporter采集主机的性能指标数据,并通过暴露的 /metrics 接口用prometheus抓取   通过kube-apiserver、kube-controller-manager、kube-scheduler、etcd、kubelet、kube-proxy自身暴露的 /metrics 获取节点上与k8s集群相关的一些指标数据   通过cadvisor采集容器、Pod相关的性能指标数据,并通过暴露的 /metrics 接口用prometheus抓取   通过blackbox-exporter采集应用的网络性能(http、tcp、icmp等)数据,并通过暴露的 /metrics 接口用prometheus抓取   通过kube-state-metrics采集k8s资源对象的状态指标数据,并通过暴露的 /metrics 接口用prometheus抓取   应用自己采集容器中进程主动暴露的指标数据(暴露指标的功能由应用自己实现,并添加约定的annotation,prometheus负责根据annotation实现抓取) 
 
1.1. 部署kube-state-metrics kube-state-metrics (KSM)是一个简单的服务,它侦听Kubernetes API服务器并生成关于对象状态的度量。(参见下面度量部分中的例子。)它不关注单个Kubernetes组件的运行状况,而是关注内部各种对象(如部署、节点和pod)的运行状况。
1.1.1. 下载源码包 对应版本选择,我这里k8s版本是1.22,因此选择v2.3.0
kube-state-metrics 
Kubernetes 1.19  
Kubernetes 1.20  
Kubernetes 1.21  
Kubernetes 1.22  
Kubernetes 1.23  
 
 
v1.9.8  
- 
- 
- 
- 
- 
 
v2.1.1  
✓ 
✓ 
✓ 
-/✓ 
-/✓ 
 
v2.2.4  
✓ 
✓ 
✓ 
✓ 
✓ 
 
v2.3.0  
✓ 
✓ 
✓ 
✓ 
✓ 
 
master  
✓ 
✓ 
✓ 
✓ 
✓ 
 
✓ 完全支持的版本范围。 
- Kubernetes集群有一些客户端库不能使用的特性(额外的API对象,废弃的API,等等)。 
 
1.1.1.1. 下载解压 1 2 3 wget https://github.com/kubernetes/kube-state-metrics/archive/refs/tags/v2.3.0.zip unzip v2.3.0.zip cd kube-state-metrics-2.3.0/examples/standard 
 
1.1.1.2. 查看yaml文件 1 2 3 4 5 6 7 [root@harbor k8s-yaml]# ll total 20 -rw-r--r-- 1 root root  418 Dec  9 15:24 cluster-role-binding.yaml -rw-r--r-- 1 root root 1665 Dec  9 15:24 cluster-role.yaml -rw-r--r-- 1 root root 1222 Dec  9 15:24 deployment.yaml -rw-r--r-- 1 root root  234 Dec  9 15:24 service-account.yaml -rw-r--r-- 1 root root  447 Dec  9 15:24 service.yaml 
 
1.1.1.3. 准备镜像 因为镜像需要科学上网才可以下载,这里我做好镜像上传到dockerhub,可直接替换
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 # 拉取镜像 [root@app1 ~]# docker pull k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 v2.3.0: Pulling from kube-state-metrics/kube-state-metrics e8614d09b7be: Pull complete  53ccb90bafd7: Pull complete  Digest: sha256:c9137505edaef138cc23479c73e46e9a3ef7ec6225b64789a03609c973b99030 Status: Downloaded newer image for k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0 # 查看 [root@app1 ~]# docker images|grep kube-state-metrics k8s.gcr.io/kube-state-metrics/kube-state-metrics   v2.3.0              df2bb3f0d0cd        2 weeks ago         38.7MB # 打tag [root@app1 ~]# docker tag df2bb3f0d0cd heyuze/kube-state-metrics:v2.3.0 # 上传dockerhub [root@app1 ~]# docker push heyuze/kube-state-metrics:v2.3.0 The push refers to repository [docker.io/heyuze/kube-state-metrics] cb4962d0d70b: Pushed  6d75f23be3dd: Pushed  v2.3.0: digest: sha256:d964b5107fb31e9020db0d3e738ba4e1fc83a242638ee7e0ae78939baaedbe59 size: 739 
 
在deployment.yaml中将镜像替换为==heyuze/kube-state-metrics:v2.3.0==即可。
1.1.2. 资源配置清单 deployment 
vim deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 apiVersion: apps/v1 kind: Deployment metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics   namespace: kube-system spec:   replicas: 1   selector:     matchLabels:       app.kubernetes.io/name: kube-state-metrics   template:     metadata:       labels:         app.kubernetes.io/component: exporter         app.kubernetes.io/name: kube-state-metrics         app.kubernetes.io/version: 2.3.0     spec:       containers:       - image: heyuze/kube-state-metrics:v2.3.0         livenessProbe:           httpGet:             path: /healthz             port: 8080           initialDelaySeconds: 5           timeoutSeconds: 5         name: kube-state-metrics         ports:         - containerPort: 8080           name: http-metrics         - containerPort: 8081           name: telemetry         readinessProbe:           httpGet:             path: /             port: 8081           initialDelaySeconds: 5           timeoutSeconds: 5         securityContext:           runAsUser: 65534       nodeSelector:         kubernetes.io/os: linux       serviceAccountName: kube-state-metrics 
 
ClusterRoleBinding 
vim cluster-role-binding.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics roleRef:   apiGroup: rbac.authorization.k8s.io   kind: ClusterRole   name: kube-state-metrics subjects: - kind: ServiceAccount   name: kube-state-metrics   namespace: kube-system 
 
ClusterRole 
vim cluster-role.yaml 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics rules: - apiGroups:   - ""   resources:   - configmaps   - secrets   - nodes   - pods   - services   - resourcequotas   - replicationcontrollers   - limitranges   - persistentvolumeclaims   - persistentvolumes   - namespaces   - endpoints   verbs:   - list   - watch - apiGroups:   - apps   resources:   - statefulsets   - daemonsets   - deployments   - replicasets   verbs:   - list   - watch - apiGroups:   - batch   resources:   - cronjobs   - jobs   verbs:   - list   - watch - apiGroups:   - autoscaling   resources:   - horizontalpodautoscalers   verbs:   - list   - watch - apiGroups:   - authentication.k8s.io   resources:   - tokenreviews   verbs:   - create - apiGroups:   - authorization.k8s.io   resources:   - subjectaccessreviews   verbs:   - create - apiGroups:   - policy   resources:   - poddisruptionbudgets   verbs:   - list   - watch - apiGroups:   - certificates.k8s.io   resources:   - certificatesigningrequests   verbs:   - list   - watch - apiGroups:   - storage.k8s.io   resources:   - storageclasses   - volumeattachments   verbs:   - list   - watch - apiGroups:   - admissionregistration.k8s.io   resources:   - mutatingwebhookconfigurations   - validatingwebhookconfigurations   verbs:   - list   - watch - apiGroups:   - networking.k8s.io   resources:   - networkpolicies   - ingresses   verbs:   - list   - watch - apiGroups:   - coordination.k8s.io   resources:   - leases   verbs:   - list   - watch 
 
service-account 
vim service-account.yaml 
1 2 3 4 5 6 7 8 9 apiVersion: v1 kind: ServiceAccount metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics   namespace: kube-system 
 
Service 
vim service.yaml 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 apiVersion: v1 kind: Service metadata:   labels:     app.kubernetes.io/component: exporter     app.kubernetes.io/name: kube-state-metrics     app.kubernetes.io/version: 2.3.0   name: kube-state-metrics   namespace: kube-system spec:   clusterIP: None   ports:   - name: http-metrics     port: 8080     targetPort: http-metrics   - name: telemetry     port: 8081     targetPort: telemetry   selector:     app.kubernetes.io/name: kube-state-metrics 
 
1.1.3. 应用资源配置清单 master机器
1 2 3 4 5 6 [root@k8s-master ~]# kubectl apply -f ./ clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created serviceaccount/kube-state-metrics created service/kube-state-metrics created 
 
检查启动情况
1 2 3 [root@k8s-master ~]# kubectl get pods,svc -n kube-system|grep kube-state-metrics kube-state-metrics-7f8f6fc7fd-qxw8z   1/1     Running   0              87s service/kube-state-metrics   ClusterIP   None         <none>        8080/TCP,8081/TCP        87s 
 
检查是否正常
1 2 [root@k8s-master ~]# curl localhost:8080/healthz ok 
 
1.2. 部署node-exporter 1.2.1. 准备node-exporter镜像 node-exporter官方dockerhub地址 node-expoerer官方github地址 
拉取镜像
1 2 3 4 5 6 7 8 [root@harbor ~]# docker pull prom/node-exporter:v1.3.1 v1.3.1: Pulling from prom/node-exporter aa2a8d90b84c: Pull complete  b45d31ee2d7f: Pull complete  b5db1e299295: Pull complete  Digest: sha256:f2269e73124dd0f60a7d19a2ce1264d33d08a985aed0ee6b0b89d0be470592cd Status: Downloaded newer image for prom/node-exporter:v1.3.1 docker.io/prom/node-exporter:v1.3.1 
 
查看拉取的镜像
1 2 3 [root@harbor ~]# docker images REPOSITORY                                      TAG                 IMAGE ID            CREATED             SIZE prom/node-exporter                              v1.3.1              1dbe0e931976        2 weeks ago         20.9MB 
 
打tag
1 [root@harbor ~]# docker tag 1dbe0e931976 heyuze/node-exporter:v1.3.1 
 
查看打成功的tag
1 2 3 4 [root@harbor ~]# docker images REPOSITORY                                       TAG                 IMAGE ID            CREATED             SIZE heyuze/node-exporter                             v1.3.1              1dbe0e931976        2 weeks ago         20.9MB prom/node-exporter                               v1.3.1              1dbe0e931976        2 weeks ago         20.9MB 
 
推送到镜像仓库
1 2 3 4 5 6 [root@harbor ~]# docker push heyuze/node-exporter:v1.3.1 The push refers to repository [docker.io/heyuze/node-exporter] 5f6d9bc8e23d: Mounted from prom/node-exporter  8d42cad20cac: Mounted from prom/node-exporter  36b45d63da70: Mounted from prom/node-exporter  v1.3.1: digest: sha256:d5b2a2e2bb07a4a5a7c4bd9e54641cab63e1d2627622dbde17efc04849d3d30d size: 948 
 
1.2.2. 准备资源配置清单 vim /data/k8s-yaml/node-exporter/node-exporter-ds.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 kind: DaemonSet apiVersion: apps/v1 metadata:   name: node-exporter   namespace: kube-system   labels:     daemon: "node-exporter"     grafanak8sapp: "true" spec:   selector:     matchLabels:       daemon: "node-exporter"       grafanak8sapp: "true"   template:     metadata:       name: node-exporter       labels:         daemon: "node-exporter"         grafanak8sapp: "true"     spec:       volumes:       - name: proc         hostPath:            path: /proc           type: ""       - name: sys         hostPath:           path: /sys           type: ""       imagePullSecrets:       - name: registry-pull-secret       containers:       - name: node-exporter         image: heyuze/node-exporter:v1.3.1         imagePullPolicy: IfNotPresent         args:         - --path.procfs=/host_proc         - --path.sysfs=/host_sys         ports:         - name: node-exporter           hostPort: 9100           containerPort: 9100           protocol: TCP         volumeMounts:         - name: sys           readOnly: true           mountPath: /host_sys         - name: proc           readOnly: true           mountPath: /host_proc       hostNetwork: true 
 
1.2.3. 应用资源配置清单 1 2 [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/node-exporter/node-exporter-ds.yaml daemonset.apps/node-exporter created 
 
检查启动情况
1 2 3 [root@k8s-master1 ~]# kubectl get pod -n kube-system|grep node-exporter node-exporter-rh7fx                   1/1     Running   0               40s node-exporter-vgnzt                   1/1     Running   0               40s 
 
健康监控状况
1 [root@k8s-node1 ~]#  curl localhost:9100/metrics 
 
只要可以获取到节点数据就表示正常
1.3. 部署cadvisor  cAdvisor对Node机器上的资源及容器进行实时监控和性能数据采集,包括CPU使用情况、内存使用情况、网络吞吐量及文件系统使用情况,cAdvisor集成在Kubelet中,当kubelet启动时会自动启动cAdvisor,即一个cAdvisor仅对一台Node机器进行监控。kubelet的启动参数–cadvisor-port可以定义cAdvisor对外提供服务的端口,默认为4194。可以通过浏览器访问。
1.3.1. 准备cadvisor镜像 cadvisor官方dockerhub地址 cadvisor官方github地址 
cadvisor官方gcr地址 
由于google已经不在dockerhub更新cadvisor镜像,最新的镜像都更新到gcr.io/cadvisor/cadvisor,我这里下载后上传到dockerhub,修改镜像地址即可。
拉取镜像
1 2 3 4 5 6 7 8 9 10 [root@harbor harbor]# docker pull gcr.io/cadvisor/cadvisor:v0.43.0 v0.43.0: Pulling from cadvisor/cadvisor e519532ddf75: Pull complete  2e08db3b6bd0: Pull complete  83f705f3387b: Pull complete  7f10f7c55689: Pull complete  3fdbcd5b103f: Pull complete  Digest: sha256:89e6137f068ded2e9a3a012ce71260b9afc57a19305842aa1074239841a539a7 Status: Downloaded newer image for gcr.io/cadvisor/cadvisor:v0.43.0 gcr.io/cadvisor/cadvisor:v0.43.0 
 
查看镜像
1 2 3 [root@harbor harbor]# docker images REPOSITORY                                       TAG                 IMAGE ID            CREATED             SIZE gcr.io/cadvisor/cadvisor                         v0.43.0             80f16aa8c3c8        6 weeks ago         87.5MB 
 
打tag
1 [root@harbor harbor]# docker tag 80f16aa8c3c8 heyuze/cadvisor:v0.43.0 
 
推送
1 2 3 4 5 6 7 8 [root@harbor harbor]# docker push heyuze/cadvisor:v0.43.0 The push refers to repository [docker.io/heyuze/cadvisor] f2485927f8bd: Pushed  571a7fddbc78: Pushed  f1e964b32d2a: Pushed  41768b6793f5: Pushed  e6688e911f15: Pushed  v0.43.0: digest: sha256:89e6137f068ded2e9a3a012ce71260b9afc57a19305842aa1074239841a539a7 size: 1373 
 
1.3.2. 准备资源配置清单 vi /data/k8s-yaml/cadvisor/daemonset.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 apiVersion: apps/v1 kind: DaemonSet metadata:   name: cadvisor   namespace: kube-system   labels:     app: cadvisor spec:   selector:     matchLabels:       name: cadvisor   template:     metadata:       labels:         name: cadvisor     spec:       hostNetwork: true       tolerations:       - key: node-role.kubernetes.io/master         effect: NoSchedule       containers:       - name: cadvisor         image: heyuze/cadvisor:v0.43.0         imagePullPolicy: IfNotPresent         volumeMounts:         - name: rootfs           mountPath: /rootfs           readOnly: true         - name: var-run           mountPath: /var/run         - name: sys           mountPath: /sys           readOnly: true         - name: docker           mountPath: /var/lib/docker           readOnly: true         ports:           - name: http             containerPort: 4194             protocol: TCP         readinessProbe:           tcpSocket:             port: 4194           initialDelaySeconds: 5           periodSeconds: 10         args:           - --housekeeping_interval=10s           - --port=4194       terminationGracePeriodSeconds: 30       volumes:       - name: rootfs         hostPath:           path: /       - name: var-run         hostPath:           path: /var/run       - name: sys         hostPath:           path: /sys       - name: docker         hostPath:           path: /data/docker 
 
1.3.3. 修改运算节点软连接 所有运算节点上:
1 2 3 4 5 6 7 8 [root@k8s-node1 ~]# mount -o remount,rw /sys/fs/cgroup/ [root@k8s-node1 ~]# ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu [root@k8s-master1 ~]# ll /sys/fs/cgroup/ | grep cpu lrwxrwxrwx 1 root root 11 Feb 26 13:36 cpu -> cpu,cpuacct lrwxrwxrwx 1 root root 11 Feb 26 13:36 cpuacct -> cpu,cpuacct lrwxrwxrwx 1 root root 27 Mar 11 12:54 cpuacct,cpu -> /sys/fs/cgroup/cpu,cpuacct/ drwxr-xr-x 5 root root  0 Feb 26 13:36 cpu,cpuacct drwxr-xr-x 3 root root  0 Feb 26 13:36 cpuset 
 
1.3.4. 应用资源配置清单 任意运算节点上:
1 2 [root@k8s-master ~]# kubectl apply -f http://www.kubelet.cn/k8s-yaml/cadvisor/daemonset.yaml daemonset.apps/cadvisor created 
 
查看运行端口(node节点)
1 2 [root@k8s-node1 ~]# netstat -luntp|grep 4194 tcp6       0      0 :::4194                 :::*                    LISTEN      1634868/cadvisor     
 
1.4. 部署blackbox-exporter 1.4.1. 准备blackbox-exporter镜像 blackbox-exporter官方dockerhub地址 blackbox-exporter官方github地址 
拉取镜像
1 2 3 4 5 6 7 8 9 [root@harbor ~]# docker pull prom/blackbox-exporter:v0.19.0 v0.19.0: Pulling from prom/blackbox-exporter aa2a8d90b84c: Pull complete  b45d31ee2d7f: Pull complete  1603b92f0389: Pull complete  a8140d619b2f: Pull complete  Digest: sha256:94de5897eef1b3c1ba7fbfebb9af366e032c0ff915a52c0066ff2e0c1bcd2e45 Status: Downloaded newer image for prom/blackbox-exporter:v0.19.0 docker.io/prom/blackbox-exporter:v0.19.0 
 
查看镜像
1 2 3 [root@harbor ~]# docker images REPOSITORY                                    TAG                 IMAGE ID            CREATED         SIZE prom/blackbox-exporter                        v0.19.0             c9e462ce1ee4        7 months ago    20.9MB 
 
打tag
1 [root@harbor ~]# docker tag c9e462ce1ee4 heyuze/blackbox-exporter:v0.19.0 
 
推送
1 2 3 4 5 6 7 [root@harbor ~]# docker push heyuze/blackbox-exporter:v0.19.0 The push refers to repository [harbor.gong-hui.com/gonghui/blackbox-exporter] 256c4aa8ebe5: Pushed  4b6cc55de649: Pushed  986894c42222: Pushed  adab5d09ba79: Pushed  v0.15.1: digest: sha256:c20445e0cc628fa4b227fe2f694c22a314beb43fd8297095b6ee6cbc67161336 size: 1155 
 
1.4.2. 准备资源配置清单 ConfigMap 
/data/k8s-yaml/blackbox-exporter/configmap.yaml
[root@harbor ~]# 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 apiVersion: v1 kind: ConfigMap metadata:   labels:     app: blackbox-exporter   name: blackbox-exporter   namespace: kube-system data:   blackbox.yml: |-     modules:       http_2xx:         prober: http         timeout: 2s         http:           valid_http_versions: ["HTTP/1.1", "HTTP/2"]           valid_status_codes: [200,301,302]           method: GET           preferred_ip_protocol: "ip4"       tcp_connect:         prober: tcp         timeout: 2s 
 
Deployment 
/data/k8s-yaml/blackbox-exporter/deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 kind: Deployment apiVersion: apps/v1 metadata:   name: blackbox-exporter   namespace: kube-system   labels:     app: blackbox-exporter #  annotations: #    deployment.kubernetes.io/revision: 1 spec:   replicas: 1   selector:     matchLabels:       app: blackbox-exporter   template:     metadata:       labels:         app: blackbox-exporter     spec:       volumes:       - name: config         configMap:           name: blackbox-exporter           defaultMode: 420       containers:       - name: blackbox-exporter         image: heyuze/blackbox-exporter:v0.19.0         imagePullPolicy: IfNotPresent         args:         - --config.file=/etc/blackbox_exporter/blackbox.yml         - --log.level=info         - --web.listen-address=:9115         ports:         - name: blackbox-port           containerPort: 9115           protocol: TCP         resources:           limits:             cpu: 200m             memory: 256Mi           requests:             cpu: 100m             memory: 50Mi         volumeMounts:         - name: config           mountPath: /etc/blackbox_exporter         readinessProbe:           tcpSocket:             port: 9115           initialDelaySeconds: 5           timeoutSeconds: 5           periodSeconds: 10           successThreshold: 1           failureThreshold: 3 
 
Service 
/data/k8s-yaml/blackbox-exporter/service.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 kind: Service apiVersion: v1 metadata:   name: blackbox-exporter   namespace: kube-system spec:   selector:     app: blackbox-exporter   ports:     - name: blackbox-port       protocol: TCP       port: 9115       targetPort: 9115       type: LoadBalancer   type: LoadBalancer 
 
Ingress 
/data/k8s-yaml/blackbox-exporter/ingress.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 apiVersion: extensions/v1beta1 kind: Ingress metadata:   name: blackbox-exporter   namespace: kube-system spec:   rules:   - host: blackbox.kubelet.cn     http:       paths:       - path: /         backend:           serviceName: blackbox-exporter           servicePort: blackbox-port apiVersion: networking.k8s.io/v1 kind: Ingress metadata:   name: blackbox-exporter   annotations:     nginx.ingress.kubernetes.io/ingress.class: 'nginx'   namespace: kube-system spec:   ingressClassName: nginx   rules:   - host: blackbox.kubelet.cn     http:       paths:       - path: /         pathType: Prefix         backend:           service:             name: blackbox-exporter             port:               number: blackbox-port 
 
1.4.3. 应用资源配置清单 1 2 3 4 5 6 7 8 [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/configmap.yaml configmap/blackbox-exporter created [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/deployment.yaml deployment.apps/blackbox-exporter created [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/service.yaml service/blackbox-exporter created [root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/ingress.yaml service/blackbox-exporter created 
 
1.5. 部署prometheus 运维主机
1.5.1. 准备prometheus镜像 prometheus官方dockerhub地址 prometheus官方github地址 
拉取镜像
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [root@harbor ~]# docker pull prom/prometheus:v2.32.0 v2.32.0: Pulling from prom/prometheus 3cb635b06aa2: Pull complete  c4d1a94ab1db: Pull complete  41c8679f1eb7: Pull complete  1650e28e81f3: Pull complete  a4af63abea67: Pull complete  101065466520: Pull complete  e7d092467524: Pull complete  920f29a8238e: Pull complete  d22cebb42c02: Pull complete  102a95cf6327: Pull complete  c14687945637: Pull complete  d2136b8fa9a3: Pull complete  Digest: sha256:68aa603f9d797a8423e766b625cab4202bda7d9be8fc44d4e904dcea7f142177 Status: Downloaded newer image for prom/prometheus:v2.32.0 docker.io/prom/prometheus:v2.32.0 
 
查看镜像
1 2 3 [root@harbor ~]# docker images REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE prom/prometheus          v2.32.0             9e4125f21d5f        12 days ago         201MB 
 
打tag
1 [root@harbor ~]# docker tag 9e4125f21d5f heyuze/prometheus:v2.32.0 
 
推送
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [root@harbor ~]# docker push heyuze/prometheus:v2.32.0 The push refers to repository [docker.io/heyuze/prometheus] 2e73ed7d5d4d: Mounted from prom/prometheus  10e7fc7c8b3d: Mounted from prom/prometheus  051ea654c6e9: Mounted from prom/prometheus  913f2f736476: Mounted from prom/prometheus  f2083decbd50: Mounted from prom/prometheus  8bbbfc276b7c: Mounted from prom/prometheus  d90183f5bbf3: Mounted from prom/prometheus  5df4d348c75e: Mounted from prom/prometheus  3120b8f3e6b5: Mounted from prom/prometheus  9cbe643a4493: Mounted from prom/prometheus  29908fb03ed8: Mounted from prom/prometheus  64cac9eaf0da: Mounted from prom/prometheus  v2.32.0: digest: sha256:a8f33123429b8df0d01af19f639c4427a434e3143e0c4df84f688886960f53c4 size: 2823 
 
1.5.2. 准备资源配置清单 运维主机
/data/k8s-yaml
1 2 [root@harbor ~]# mkdir /data/k8s-yaml/prometheus && mkdir -p /data/nfs-volume/prometheus/etc && cd /data/k8s-yaml/prometheus [root@harbor prometheus]# 
 
RBAC 
vim /data/k8s-yaml/prometheus/rbac.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 apiVersion: v1 kind: ServiceAccount metadata:   labels:     addonmanager.kubernetes.io/mode: Reconcile     kubernetes.io/cluster-service: "true"   name: prometheus   namespace: infra --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:   labels:     addonmanager.kubernetes.io/mode: Reconcile     kubernetes.io/cluster-service: "true"   name: prometheus rules: - apiGroups:   - ""   resources:   - nodes   - nodes/metrics   - services   - endpoints   - pods   verbs:   - get   - list   - watch - apiGroups:   - ""   resources:   - configmaps   verbs:   - get - nonResourceURLs:   - /metrics   verbs:   - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:   labels:     addonmanager.kubernetes.io/mode: Reconcile     kubernetes.io/cluster-service: "true"   name: prometheus roleRef:   apiGroup: rbac.authorization.k8s.io   kind: ClusterRole   name: prometheus subjects: - kind: ServiceAccount   name: prometheus   namespace: infra 
 
Deployment
vi /data/k8s-yaml/prometheus/deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 apiVersion: apps/v1 kind: Deployment metadata:   annotations:     deployment.kubernetes.io/revision: "5"   labels:     name: prometheus   name: prometheus   namespace: infra spec:   progressDeadlineSeconds: 600   replicas: 1   revisionHistoryLimit: 7   selector:     matchLabels:       app: prometheus   strategy:     rollingUpdate:       maxSurge: 1       maxUnavailable: 1     type: RollingUpdate   template:     metadata:       labels:         app: prometheus     spec:       containers:       - name: prometheus         image: heyuze/prometheus:v2.32.0         imagePullPolicy: IfNotPresent         command:         - /bin/prometheus         args:         - --config.file=/data/etc/prometheus.yml         - --storage.tsdb.path=/data/prom-db         - --storage.tsdb.min-block-duration=10m         - --storage.tsdb.retention=72h         ports:         - containerPort: 9090           protocol: TCP         volumeMounts:         - mountPath: /data           name: data         resources:           requests:             cpu: "1000m"             memory: "1.5Gi"           limits:             cpu: "2000m"             memory: "3Gi"       imagePullSecrets:       - name: harbor       securityContext:         runAsUser: 0       serviceAccountName: prometheus       volumes:       - name: data         nfs:           server: 192.168.101.198           path: /data/nfs-volume/prometheus 
 
Service 
vim /data/k8s-yaml/prometheus/service.yaml
1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: v1 kind: Service metadata:   name: prometheus   namespace: infra spec:   ports:   - port: 9090     protocol: TCP     targetPort: 9090   selector:     app: prometheus 
 
Ingress 
vim /data/k8s-yaml/prometheus/ingress.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 apiVersion: extensions/v1beta1 kind: Ingress metadata:   annotations:     kubernetes.io/ingress.class: nginx   name: prometheus   namespace: infra spec:   rules:   - host: prometheus.kubelet.cn     http:       paths:       - path: /         backend:           serviceName: prometheus           servicePort: 9090 
 
1.5.3. 准备prometheus的配置文件 拷贝证书文件
1 2 3 4 5 6 7 8 [root@k8s-master1 k8s]# pwd /root/TLS/k8s [root@k8s-master1 k8s]# scp ca.pem server.pem server-key.pem root@192.168.3.187:/data/nfs-volume/prometheus/etc root@192.168.3.187's password:  ca.pem                                                                                       100% 1359     1.2MB/s   00:00     server.pem                                                                                   100% 1684     1.5MB/s   00:00     server-key.pem                                                                               100% 1679     1.6MB/s   00:00     [root@k8s-master1 k8s]#  
 
查看证书
1 2 3 4 5 6 [root@harbor etc]# ll total 20 -rw-r--r-- 1 root root 1359 Mar 12 21:02 ca.pem -rw-r--r-- 1 root root 5437 Mar 12 20:27 prometheus.yml -rw------- 1 root root 1679 Mar 12 21:02 server-key.pem -rw-r--r-- 1 root root 1684 Mar 12 21:02 server.pem 
 
运算节点
/data/nfs-volume/prometheus/etc/prometheus.yml
global:   scrape_interval:     15s   evaluation_interval: 15s scrape_configs: - job_name: 'etcd'   tls_config:     ca_file: /data/etc/ca.pem     cert_file: /data/etc/server.pem     key_file: /data/etc/server-key.pem   scheme: https   static_configs:   - targets:     - '192.168.3.183:2379'     - '192.168.3.184:2379'     - '192.168.3.185:2379' - job_name: 'kubernetes-apiservers'   kubernetes_sd_configs:   - role: endpoints   scheme: https   tls_config:     ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt   bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token   relabel_configs:   - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]     action: keep     regex: default;kubernetes;https - job_name: 'kubernetes-pods'   kubernetes_sd_configs:   - role: pod   relabel_configs:   - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]     action: keep     regex: true   - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]     action: replace     target_label: __metrics_path__     regex: (.+)   - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]     action: replace     regex: ([^:]+)(?::\d+)?;(\d+)     replacement: $1:$2     target_label: __address__   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name - job_name: 'kubernetes-kubelet'   kubernetes_sd_configs:   - role: node   relabel_configs:   - action: labelmap     regex: __meta_kubernetes_node_label_(.+)   - source_labels: [__meta_kubernetes_node_name]     regex: (.+)     target_label: __address__     replacement: ${1}:10255 - job_name: 'kubernetes-cadvisor'   kubernetes_sd_configs:   - role: node   relabel_configs:   - action: labelmap     regex: __meta_kubernetes_node_label_(.+)   - source_labels: [__meta_kubernetes_node_name]     regex: (.+)     target_label: __address__     replacement: ${1}:4194 - job_name: 'kubernetes-kube-state'   kubernetes_sd_configs:   - role: pod   relabel_configs:   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name   - source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]     regex: .*true.*     action: keep   - source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']     regex: 'node-exporter;(.*)'     action: replace     target_label: nodename - job_name: 'blackbox_http_pod_probe'   metrics_path: /probe   kubernetes_sd_configs:   - role: pod   params:     module: [http_2xx]   relabel_configs:   - source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]     action: keep     regex: http   - source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port,  __meta_kubernetes_pod_annotation_blackbox_path]     action: replace     regex: ([^:]+)(?::\d+)?;(\d+);(.+)     replacement: $1:$2$3     target_label: __param_target   - action: replace     target_label: __address__     replacement: blackbox-exporter.kube-system:9115   - source_labels: [__param_target]     target_label: instance   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name - job_name: 'blackbox_tcp_pod_probe'   metrics_path: /probe   kubernetes_sd_configs:   - role: pod   params:     module: [tcp_connect]   relabel_configs:   - source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]     action: keep     regex: tcp   - source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port]     action: replace     regex: ([^:]+)(?::\d+)?;(\d+)     replacement: $1:$2     target_label: __param_target   - action: replace     target_label: __address__     replacement: blackbox-exporter.kube-system:9115   - source_labels: [__param_target]     target_label: instance   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name - job_name: 'traefik'   kubernetes_sd_configs:   - role: pod   relabel_configs:   - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]     action: keep     regex: traefik   - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]     action: replace     target_label: __metrics_path__     regex: (.+)   - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]     action: replace     regex: ([^:]+)(?::\d+)?;(\d+)     replacement: $1:$2     target_label: __address__   - action: labelmap     regex: __meta_kubernetes_pod_label_(.+)   - source_labels: [__meta_kubernetes_namespace]     action: replace     target_label: kubernetes_namespace   - source_labels: [__meta_kubernetes_pod_name]     action: replace     target_label: kubernetes_pod_name 
 
1.5.4. 应用资源配置清单 1 2 3 4 5 6 7 kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml 
 
1.5.5. 访问 解析域名到vip
prometheus.gong-hui.com