使用Prometheus和Grafana监控kubernetes集群

1. 使用Prometheus和Grafana监控kubernetes集群

  • 采集方案:
1
2
3
4
5
6
7
8
9
10
11
通过prometheus-node-exporter采集主机的性能指标数据,并通过暴露的 /metrics 接口用prometheus抓取

通过kube-apiserver、kube-controller-manager、kube-scheduler、etcd、kubelet、kube-proxy自身暴露的 /metrics 获取节点上与k8s集群相关的一些指标数据

通过cadvisor采集容器、Pod相关的性能指标数据,并通过暴露的 /metrics 接口用prometheus抓取

通过blackbox-exporter采集应用的网络性能(http、tcp、icmp等)数据,并通过暴露的 /metrics 接口用prometheus抓取

通过kube-state-metrics采集k8s资源对象的状态指标数据,并通过暴露的 /metrics 接口用prometheus抓取

应用自己采集容器中进程主动暴露的指标数据(暴露指标的功能由应用自己实现,并添加约定的annotation,prometheus负责根据annotation实现抓取)

1.1. 部署kube-state-metrics

kube-state-metrics (KSM)是一个简单的服务,它侦听Kubernetes API服务器并生成关于对象状态的度量。(参见下面度量部分中的例子。)它不关注单个Kubernetes组件的运行状况,而是关注内部各种对象(如部署、节点和pod)的运行状况。

1.1.1. 下载源码包

对应版本选择,我这里k8s版本是1.22,因此选择v2.3.0

kube-state-metrics Kubernetes 1.19 Kubernetes 1.20 Kubernetes 1.21 Kubernetes 1.22 Kubernetes 1.23
v1.9.8 - - - - -
v2.1.1 -/✓ -/✓
v2.2.4
v2.3.0
master
  • 完全支持的版本范围。
  • - Kubernetes集群有一些客户端库不能使用的特性(额外的API对象,废弃的API,等等)。

1.1.1.1. 下载解压

1
2
3
wget https://github.com/kubernetes/kube-state-metrics/archive/refs/tags/v2.3.0.zip
unzip v2.3.0.zip
cd kube-state-metrics-2.3.0/examples/standard

1.1.1.2. 查看yaml文件

1
2
3
4
5
6
7
[root@harbor k8s-yaml]# ll
total 20
-rw-r--r-- 1 root root 418 Dec 9 15:24 cluster-role-binding.yaml
-rw-r--r-- 1 root root 1665 Dec 9 15:24 cluster-role.yaml
-rw-r--r-- 1 root root 1222 Dec 9 15:24 deployment.yaml
-rw-r--r-- 1 root root 234 Dec 9 15:24 service-account.yaml
-rw-r--r-- 1 root root 447 Dec 9 15:24 service.yaml

1.1.1.3. 准备镜像

因为镜像需要科学上网才可以下载,这里我做好镜像上传到dockerhub,可直接替换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 拉取镜像
[root@app1 ~]# docker pull k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0
v2.3.0: Pulling from kube-state-metrics/kube-state-metrics
e8614d09b7be: Pull complete
53ccb90bafd7: Pull complete
Digest: sha256:c9137505edaef138cc23479c73e46e9a3ef7ec6225b64789a03609c973b99030
Status: Downloaded newer image for k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0
k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0

# 查看
[root@app1 ~]# docker images|grep kube-state-metrics
k8s.gcr.io/kube-state-metrics/kube-state-metrics v2.3.0 df2bb3f0d0cd 2 weeks ago 38.7MB

# 打tag
[root@app1 ~]# docker tag df2bb3f0d0cd heyuze/kube-state-metrics:v2.3.0

# 上传dockerhub
[root@app1 ~]# docker push heyuze/kube-state-metrics:v2.3.0
The push refers to repository [docker.io/heyuze/kube-state-metrics]
cb4962d0d70b: Pushed
6d75f23be3dd: Pushed
v2.3.0: digest: sha256:d964b5107fb31e9020db0d3e738ba4e1fc83a242638ee7e0ae78939baaedbe59 size: 739

在deployment.yaml中将镜像替换为==heyuze/kube-state-metrics:v2.3.0==即可。

1.1.2. 资源配置清单

deployment

vim deployment.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.3.0
name: kube-state-metrics
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
template:
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.3.0
spec:
containers:
- image: heyuze/kube-state-metrics:v2.3.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
- containerPort: 8081
name: telemetry
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
securityContext:
runAsUser: 65534
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: kube-state-metrics

ClusterRoleBinding

vim cluster-role-binding.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.3.0
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system

ClusterRole

vim cluster-role.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.3.0
name: kube-state-metrics
rules:
- apiGroups:
- ""
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- volumeattachments
verbs:
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
- ingresses
verbs:
- list
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- list
- watch

service-account

vim service-account.yaml

1
2
3
4
5
6
7
8
9
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.3.0
name: kube-state-metrics
namespace: kube-system

Service

vim service.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.3.0
name: kube-state-metrics
namespace: kube-system
spec:
clusterIP: None
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
- name: telemetry
port: 8081
targetPort: telemetry
selector:
app.kubernetes.io/name: kube-state-metrics

1.1.3. 应用资源配置清单

master机器

1
2
3
4
5
6
[root@k8s-master ~]# kubectl apply -f ./
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
serviceaccount/kube-state-metrics created
service/kube-state-metrics created

检查启动情况

1
2
3
[root@k8s-master ~]# kubectl get pods,svc -n kube-system|grep kube-state-metrics
kube-state-metrics-7f8f6fc7fd-qxw8z 1/1 Running 0 87s
service/kube-state-metrics ClusterIP None <none> 8080/TCP,8081/TCP 87s

检查是否正常

1
2
[root@k8s-master ~]# curl localhost:8080/healthz
ok

1.2. 部署node-exporter

1.2.1. 准备node-exporter镜像

node-exporter官方dockerhub地址
node-expoerer官方github地址

拉取镜像

1
2
3
4
5
6
7
8
[root@harbor ~]# docker pull prom/node-exporter:v1.3.1
v1.3.1: Pulling from prom/node-exporter
aa2a8d90b84c: Pull complete
b45d31ee2d7f: Pull complete
b5db1e299295: Pull complete
Digest: sha256:f2269e73124dd0f60a7d19a2ce1264d33d08a985aed0ee6b0b89d0be470592cd
Status: Downloaded newer image for prom/node-exporter:v1.3.1
docker.io/prom/node-exporter:v1.3.1

查看拉取的镜像

1
2
3
[root@harbor ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
prom/node-exporter v1.3.1 1dbe0e931976 2 weeks ago 20.9MB

打tag

1
[root@harbor ~]# docker tag 1dbe0e931976 heyuze/node-exporter:v1.3.1

查看打成功的tag

1
2
3
4
[root@harbor ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
heyuze/node-exporter v1.3.1 1dbe0e931976 2 weeks ago 20.9MB
prom/node-exporter v1.3.1 1dbe0e931976 2 weeks ago 20.9MB

推送到镜像仓库

1
2
3
4
5
6
[root@harbor ~]# docker push heyuze/node-exporter:v1.3.1
The push refers to repository [docker.io/heyuze/node-exporter]
5f6d9bc8e23d: Mounted from prom/node-exporter
8d42cad20cac: Mounted from prom/node-exporter
36b45d63da70: Mounted from prom/node-exporter
v1.3.1: digest: sha256:d5b2a2e2bb07a4a5a7c4bd9e54641cab63e1d2627622dbde17efc04849d3d30d size: 948

1.2.2. 准备资源配置清单

vim /data/k8s-yaml/node-exporter/node-exporter-ds.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: node-exporter
namespace: kube-system
labels:
daemon: "node-exporter"
grafanak8sapp: "true"
spec:
selector:
matchLabels:
daemon: "node-exporter"
grafanak8sapp: "true"
template:
metadata:
name: node-exporter
labels:
daemon: "node-exporter"
grafanak8sapp: "true"
spec:
volumes:
- name: proc
hostPath:
path: /proc
type: ""
- name: sys
hostPath:
path: /sys
type: ""
imagePullSecrets:
- name: registry-pull-secret
containers:
- name: node-exporter
image: heyuze/node-exporter:v1.3.1
imagePullPolicy: IfNotPresent
args:
- --path.procfs=/host_proc
- --path.sysfs=/host_sys
ports:
- name: node-exporter
hostPort: 9100
containerPort: 9100
protocol: TCP
volumeMounts:
- name: sys
readOnly: true
mountPath: /host_sys
- name: proc
readOnly: true
mountPath: /host_proc
hostNetwork: true

1.2.3. 应用资源配置清单

1
2
[root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/node-exporter/node-exporter-ds.yaml
daemonset.apps/node-exporter created

检查启动情况

1
2
3
[root@k8s-master1 ~]# kubectl get pod -n kube-system|grep node-exporter
node-exporter-rh7fx 1/1 Running 0 40s
node-exporter-vgnzt 1/1 Running 0 40s

健康监控状况

1
[root@k8s-node1 ~]#  curl localhost:9100/metrics

只要可以获取到节点数据就表示正常

1.3. 部署cadvisor

cAdvisor对Node机器上的资源及容器进行实时监控和性能数据采集,包括CPU使用情况、内存使用情况、网络吞吐量及文件系统使用情况,cAdvisor集成在Kubelet中,当kubelet启动时会自动启动cAdvisor,即一个cAdvisor仅对一台Node机器进行监控。kubelet的启动参数–cadvisor-port可以定义cAdvisor对外提供服务的端口,默认为4194。可以通过浏览器访问。

1.3.1. 准备cadvisor镜像

cadvisor官方dockerhub地址
cadvisor官方github地址

cadvisor官方gcr地址

由于google已经不在dockerhub更新cadvisor镜像,最新的镜像都更新到gcr.io/cadvisor/cadvisor,我这里下载后上传到dockerhub,修改镜像地址即可。

拉取镜像

1
2
3
4
5
6
7
8
9
10
[root@harbor harbor]# docker pull gcr.io/cadvisor/cadvisor:v0.43.0
v0.43.0: Pulling from cadvisor/cadvisor
e519532ddf75: Pull complete
2e08db3b6bd0: Pull complete
83f705f3387b: Pull complete
7f10f7c55689: Pull complete
3fdbcd5b103f: Pull complete
Digest: sha256:89e6137f068ded2e9a3a012ce71260b9afc57a19305842aa1074239841a539a7
Status: Downloaded newer image for gcr.io/cadvisor/cadvisor:v0.43.0
gcr.io/cadvisor/cadvisor:v0.43.0

查看镜像

1
2
3
[root@harbor harbor]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/cadvisor/cadvisor v0.43.0 80f16aa8c3c8 6 weeks ago 87.5MB

打tag

1
[root@harbor harbor]# docker tag 80f16aa8c3c8 heyuze/cadvisor:v0.43.0

推送

1
2
3
4
5
6
7
8
[root@harbor harbor]# docker push heyuze/cadvisor:v0.43.0
The push refers to repository [docker.io/heyuze/cadvisor]
f2485927f8bd: Pushed
571a7fddbc78: Pushed
f1e964b32d2a: Pushed
41768b6793f5: Pushed
e6688e911f15: Pushed
v0.43.0: digest: sha256:89e6137f068ded2e9a3a012ce71260b9afc57a19305842aa1074239841a539a7 size: 1373

1.3.2. 准备资源配置清单

vi /data/k8s-yaml/cadvisor/daemonset.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cadvisor
namespace: kube-system
labels:
app: cadvisor
spec:
selector:
matchLabels:
name: cadvisor
template:
metadata:
labels:
name: cadvisor
spec:
hostNetwork: true
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: cadvisor
image: heyuze/cadvisor:v0.43.0
imagePullPolicy: IfNotPresent
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: var-run
mountPath: /var/run
- name: sys
mountPath: /sys
readOnly: true
- name: docker
mountPath: /var/lib/docker
readOnly: true
ports:
- name: http
containerPort: 4194
protocol: TCP
readinessProbe:
tcpSocket:
port: 4194
initialDelaySeconds: 5
periodSeconds: 10
args:
- --housekeeping_interval=10s
- --port=4194
terminationGracePeriodSeconds: 30
volumes:
- name: rootfs
hostPath:
path: /
- name: var-run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /data/docker

1.3.3. 修改运算节点软连接

所有运算节点上:

1
2
3
4
5
6
7
8
[root@k8s-node1 ~]# mount -o remount,rw /sys/fs/cgroup/
[root@k8s-node1 ~]# ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu
[root@k8s-master1 ~]# ll /sys/fs/cgroup/ | grep cpu
lrwxrwxrwx 1 root root 11 Feb 26 13:36 cpu -> cpu,cpuacct
lrwxrwxrwx 1 root root 11 Feb 26 13:36 cpuacct -> cpu,cpuacct
lrwxrwxrwx 1 root root 27 Mar 11 12:54 cpuacct,cpu -> /sys/fs/cgroup/cpu,cpuacct/
drwxr-xr-x 5 root root 0 Feb 26 13:36 cpu,cpuacct
drwxr-xr-x 3 root root 0 Feb 26 13:36 cpuset

1.3.4. 应用资源配置清单

任意运算节点上:

1
2
[root@k8s-master ~]# kubectl apply -f http://www.kubelet.cn/k8s-yaml/cadvisor/daemonset.yaml
daemonset.apps/cadvisor created

查看运行端口(node节点)

1
2
[root@k8s-node1 ~]# netstat -luntp|grep 4194
tcp6 0 0 :::4194 :::* LISTEN 1634868/cadvisor

1.4. 部署blackbox-exporter

1.4.1. 准备blackbox-exporter镜像

blackbox-exporter官方dockerhub地址
blackbox-exporter官方github地址

拉取镜像

1
2
3
4
5
6
7
8
9
[root@harbor ~]# docker pull prom/blackbox-exporter:v0.19.0
v0.19.0: Pulling from prom/blackbox-exporter
aa2a8d90b84c: Pull complete
b45d31ee2d7f: Pull complete
1603b92f0389: Pull complete
a8140d619b2f: Pull complete
Digest: sha256:94de5897eef1b3c1ba7fbfebb9af366e032c0ff915a52c0066ff2e0c1bcd2e45
Status: Downloaded newer image for prom/blackbox-exporter:v0.19.0
docker.io/prom/blackbox-exporter:v0.19.0

查看镜像

1
2
3
[root@harbor ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
prom/blackbox-exporter v0.19.0 c9e462ce1ee4 7 months ago 20.9MB

打tag

1
[root@harbor ~]# docker tag c9e462ce1ee4 heyuze/blackbox-exporter:v0.19.0

推送

1
2
3
4
5
6
7
[root@harbor ~]# docker push heyuze/blackbox-exporter:v0.19.0
The push refers to repository [harbor.gong-hui.com/gonghui/blackbox-exporter]
256c4aa8ebe5: Pushed
4b6cc55de649: Pushed
986894c42222: Pushed
adab5d09ba79: Pushed
v0.15.1: digest: sha256:c20445e0cc628fa4b227fe2f694c22a314beb43fd8297095b6ee6cbc67161336 size: 1155

1.4.2. 准备资源配置清单

ConfigMap

/data/k8s-yaml/blackbox-exporter/configmap.yaml

[root@harbor ~]#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app: blackbox-exporter
name: blackbox-exporter
namespace: kube-system
data:
blackbox.yml: |-
modules:
http_2xx:
prober: http
timeout: 2s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
valid_status_codes: [200,301,302]
method: GET
preferred_ip_protocol: "ip4"
tcp_connect:
prober: tcp
timeout: 2s

Deployment

/data/k8s-yaml/blackbox-exporter/deployment.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
kind: Deployment
apiVersion: apps/v1
metadata:
name: blackbox-exporter
namespace: kube-system
labels:
app: blackbox-exporter
# annotations:
# deployment.kubernetes.io/revision: 1
spec:
replicas: 1
selector:
matchLabels:
app: blackbox-exporter
template:
metadata:
labels:
app: blackbox-exporter
spec:
volumes:
- name: config
configMap:
name: blackbox-exporter
defaultMode: 420
containers:
- name: blackbox-exporter
image: heyuze/blackbox-exporter:v0.19.0
imagePullPolicy: IfNotPresent
args:
- --config.file=/etc/blackbox_exporter/blackbox.yml
- --log.level=info
- --web.listen-address=:9115
ports:
- name: blackbox-port
containerPort: 9115
protocol: TCP
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 100m
memory: 50Mi
volumeMounts:
- name: config
mountPath: /etc/blackbox_exporter
readinessProbe:
tcpSocket:
port: 9115
initialDelaySeconds: 5
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 3

Service

/data/k8s-yaml/blackbox-exporter/service.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
kind: Service
apiVersion: v1
metadata:
name: blackbox-exporter
namespace: kube-system
spec:
selector:
app: blackbox-exporter
ports:
- name: blackbox-port
protocol: TCP
port: 9115
targetPort: 9115
type: LoadBalancer
type: LoadBalancer

Ingress

/data/k8s-yaml/blackbox-exporter/ingress.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: blackbox-exporter
namespace: kube-system
spec:
rules:

- host: blackbox.kubelet.cn
http:
paths:
- path: /
backend:
serviceName: blackbox-exporter
servicePort: blackbox-port

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: blackbox-exporter
annotations:
nginx.ingress.kubernetes.io/ingress.class: 'nginx'
namespace: kube-system
spec:
ingressClassName: nginx
rules:
- host: blackbox.kubelet.cn
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: blackbox-exporter
port:
number: blackbox-port

1.4.3. 应用资源配置清单

1
2
3
4
5
6
7
8
[root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/configmap.yaml
configmap/blackbox-exporter created
[root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/deployment.yaml
deployment.apps/blackbox-exporter created
[root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/service.yaml
service/blackbox-exporter created
[root@k8s-master1 ~]# kubectl apply -f https://www.kubelet.cn/k8s-yaml/blackbox-exporter/ingress.yaml
service/blackbox-exporter created

1.5. 部署prometheus

运维主机

1.5.1. 准备prometheus镜像

prometheus官方dockerhub地址
prometheus官方github地址

拉取镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@harbor ~]# docker pull prom/prometheus:v2.32.0
v2.32.0: Pulling from prom/prometheus
3cb635b06aa2: Pull complete
c4d1a94ab1db: Pull complete
41c8679f1eb7: Pull complete
1650e28e81f3: Pull complete
a4af63abea67: Pull complete
101065466520: Pull complete
e7d092467524: Pull complete
920f29a8238e: Pull complete
d22cebb42c02: Pull complete
102a95cf6327: Pull complete
c14687945637: Pull complete
d2136b8fa9a3: Pull complete
Digest: sha256:68aa603f9d797a8423e766b625cab4202bda7d9be8fc44d4e904dcea7f142177
Status: Downloaded newer image for prom/prometheus:v2.32.0
docker.io/prom/prometheus:v2.32.0

查看镜像

1
2
3
[root@harbor ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
prom/prometheus v2.32.0 9e4125f21d5f 12 days ago 201MB

打tag

1
[root@harbor ~]# docker tag 9e4125f21d5f heyuze/prometheus:v2.32.0

推送

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@harbor ~]# docker push heyuze/prometheus:v2.32.0
The push refers to repository [docker.io/heyuze/prometheus]
2e73ed7d5d4d: Mounted from prom/prometheus
10e7fc7c8b3d: Mounted from prom/prometheus
051ea654c6e9: Mounted from prom/prometheus
913f2f736476: Mounted from prom/prometheus
f2083decbd50: Mounted from prom/prometheus
8bbbfc276b7c: Mounted from prom/prometheus
d90183f5bbf3: Mounted from prom/prometheus
5df4d348c75e: Mounted from prom/prometheus
3120b8f3e6b5: Mounted from prom/prometheus
9cbe643a4493: Mounted from prom/prometheus
29908fb03ed8: Mounted from prom/prometheus
64cac9eaf0da: Mounted from prom/prometheus
v2.32.0: digest: sha256:a8f33123429b8df0d01af19f639c4427a434e3143e0c4df84f688886960f53c4 size: 2823

1.5.2. 准备资源配置清单

运维主机

/data/k8s-yaml

1
2
[root@harbor ~]# mkdir /data/k8s-yaml/prometheus && mkdir -p /data/nfs-volume/prometheus/etc && cd /data/k8s-yaml/prometheus
[root@harbor prometheus]#

RBAC

vim /data/k8s-yaml/prometheus/rbac.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/cluster-service: "true"
name: prometheus
namespace: infra
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/cluster-service: "true"
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/cluster-service: "true"
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: infra

Deployment

vi /data/k8s-yaml/prometheus/deployment.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "5"
labels:
name: prometheus
name: prometheus
namespace: infra
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 7
selector:
matchLabels:
app: prometheus
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: heyuze/prometheus:v2.32.0
imagePullPolicy: IfNotPresent
command:
- /bin/prometheus
args:
- --config.file=/data/etc/prometheus.yml
- --storage.tsdb.path=/data/prom-db
- --storage.tsdb.min-block-duration=10m
- --storage.tsdb.retention=72h
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: /data
name: data
resources:
requests:
cpu: "1000m"
memory: "1.5Gi"
limits:
cpu: "2000m"
memory: "3Gi"
imagePullSecrets:
- name: harbor
securityContext:
runAsUser: 0
serviceAccountName: prometheus
volumes:
- name: data
nfs:
server: 192.168.101.198
path: /data/nfs-volume/prometheus

Service

vim /data/k8s-yaml/prometheus/service.yaml

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: infra
spec:
ports:
- port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus

Ingress

vim /data/k8s-yaml/prometheus/ingress.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: prometheus
namespace: infra
spec:
rules:
- host: prometheus.kubelet.cn
http:
paths:
- path: /
backend:
serviceName: prometheus
servicePort: 9090

1.5.3. 准备prometheus的配置文件

拷贝证书文件

1
2
3
4
5
6
7
8
[root@k8s-master1 k8s]# pwd
/root/TLS/k8s
[root@k8s-master1 k8s]# scp ca.pem server.pem server-key.pem root@192.168.3.187:/data/nfs-volume/prometheus/etc
root@192.168.3.187's password:
ca.pem 100% 1359 1.2MB/s 00:00
server.pem 100% 1684 1.5MB/s 00:00
server-key.pem 100% 1679 1.6MB/s 00:00
[root@k8s-master1 k8s]#

查看证书

1
2
3
4
5
6
[root@harbor etc]# ll
total 20
-rw-r--r-- 1 root root 1359 Mar 12 21:02 ca.pem
-rw-r--r-- 1 root root 5437 Mar 12 20:27 prometheus.yml
-rw------- 1 root root 1679 Mar 12 21:02 server-key.pem
-rw-r--r-- 1 root root 1684 Mar 12 21:02 server.pem

运算节点

/data/nfs-volume/prometheus/etc/prometheus.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'etcd'
tls_config:
ca_file: /data/etc/ca.pem
cert_file: /data/etc/server.pem
key_file: /data/etc/server-key.pem
scheme: https
static_configs:
- targets:
- '192.168.3.183:2379'
- '192.168.3.184:2379'
- '192.168.3.185:2379'
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kubernetes-kubelet'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __address__
replacement: ${1}:10255
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __address__
replacement: ${1}:4194
- job_name: 'kubernetes-kube-state'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]
regex: .*true.*
action: keep
- source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']
regex: 'node-exporter;(.*)'
action: replace
target_label: nodename
- job_name: 'blackbox_http_pod_probe'
metrics_path: /probe
kubernetes_sd_configs:
- role: pod
params:
module: [http_2xx]
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]
action: keep
regex: http
- source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port, __meta_kubernetes_pod_annotation_blackbox_path]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+);(.+)
replacement: $1:$2$3
target_label: __param_target
- action: replace
target_label: __address__
replacement: blackbox-exporter.kube-system:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'blackbox_tcp_pod_probe'
metrics_path: /probe
kubernetes_sd_configs:
- role: pod
params:
module: [tcp_connect]
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]
action: keep
regex: tcp
- source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __param_target
- action: replace
target_label: __address__
replacement: blackbox-exporter.kube-system:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'traefik'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: keep
regex: traefik
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name

1.5.4. 应用资源配置清单

1
2
3
4
5
6
7
kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml

kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml

kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml

kubectl apply -f http://k8s-yaml.gong-hui.com/prometheus/rbac.yaml

1.5.5. 访问

解析域名到vip

prometheus.gong-hui.com

-------------本文结束感谢您的阅读-------------