污点(Taints) 污点是加在节点(node)上的标记,用于阻止不符合条件的pod调度到该节点
污点结构 每个污点由三部分组成:
Key:标识污点的名称
Value:污点值
Effect:污点效果,决定调度行为
Effect
含义
NoSchedule
禁止新 Pod 调度(已有 Pod 不受影响)
PreferNoSchedule
尽量避免调度(非强制)
NoExecute
禁止新调度 + 驱逐已有不满足容忍的 Pod (支持延迟驱逐 tolerationSeconds)
操作示例 1 2 3 4 5 6 7 8 9 10 11 12 kubectl taint nodes node1 gpu=true :NoSchedule kubectl taint nodes node1 gpu=true :NoSchedule- root@k8s-master:~ Taints: node-role.kubernetes.io/control-plane:NoSchedule 只有key : effect
应用场景
隔离专用硬件节点(如 GPU、高性能存储)
节点维护时驱逐业务 Pod
保护敏感数据节点(仅允许特定 Pod 调度)
容忍(Tolerations) 容忍是定义在 Pod 上的属性,允许 Pod 忽略节点污点 ,从而调度到特定节点。
容忍配置 1 2 3 4 5 6 7 tolerations: - key: "gpu" operator: "Equal" value: "true" effect: "NoSchedule" tolerationSeconds: 3600
关键逻辑
典型场景
允许 AI 训练任务调度到 GPU 节点
系统组件(如 kube-proxy)容忍 Master 节点污点
维护期临时容忍 NoExecute
污点
亲和性(Affinity) 亲和性分为两类,用于引导 Pod 调度到符合规则的节点或与其他 Pod 协同部署 。
节点亲和性(Node Affinity) 控制 Pod 与节点的匹配关系:
依赖node的标签
1 2 3 4 5 6 7 8 9 10 11 12 root@k8s-master:~ node/k8s-node-1 labeled root@k8s-master:~ NAME STATUS ROLES AGE VERSION LABELS k8s-node-1 Ready <none> 18d v1.29.15 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux root@k8s-master:~ node/k8s-node-1 unlabeled
1 2 3 4 5 6 7 8 9 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: In values: [ssd ]
1 2 3 4 5 6 7 8 9 10 affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: zone operator: In values: [us-east ]
Pod 亲和性与反亲和性
亲和性(PodAffinity) :将 Pod 调度到同一拓扑域 (如相同节点、可用区)
反亲和性(PodAntiAffinity) :避免 Pod 调度到同一拓扑域 (提高高可用性)
1 2 3 4 5 6 7 8 9 podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app: database topologyKey: kubernetes.io/hostname
示例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 apiVersion: v1 kind: Pod metadata: name: toleration-test spec: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Equal value: '' affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k8s-master containers: - name: busybox image: busybox:latest imagePullPolicy: IfNotPresent command: - sh - -c - sleep 1000 root@k8s-master:~# kubectl get pods -o wide toleration-test 1 /1 Running 0 7s 10.244 .0 .47 k8s-master <none> <none>
HPA
k8s默认水平动态伸缩仅支持通过cpu和内存
想要通过其他指标 API 动态伸缩,需要安装指标采集系统 prometheus 但 Prometheus 采集的指标不兼容kubeapi, 需要一个中间件:Prometheus Adpater
Kubernetes API Server Metrics API <–> Prometheus Adpater <–> Prometheus Metrics API
HPAv1:仅支持cpu、内存
HPAv2:支持使用自定义指标来实现自动扩缩容
安装Metrics-server K8S 1.29.2 metrics
安装Prometheus Adpater wsq1203/prom-k8s (github.com)
1 git clone https://github.com/wsq1203/prom-k8s.git
HPA示例 基于内存cpu
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: multi-metrics-hpa namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: your-app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 10 scaleUp: stabilizationWindowSeconds: 60 policies: - type: Percent value: 100
基于每秒请求量(通过configmap暴漏指标)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2 metadata: name: metrics-app-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: metrics-app minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: 5 behavior: scaleDown: stabilizationWindowSeconds: 120