容忍/污点/亲和/HPA

污点（Taints）

污点是加在节点（node）上的标记，用于阻止不符合条件的pod调度到该节点

污点结构

每个污点由三部分组成：

Key：标识污点的名称
Value：污点值
Effect：污点效果，决定调度行为

Effect	含义
NoSchedule	禁止新 Pod 调度（已有 Pod 不受影响）
PreferNoSchedule	尽量避免调度（非强制）
NoExecute	禁止新调度 + 驱逐已有不满足容忍的 Pod（支持延迟驱逐 tolerationSeconds）

操作示例

# 添加污点
kubectl taint nodes node1 gpu=true:NoSchedule

# 删除污点
kubectl taint nodes node1 gpu=true:NoSchedule-

# 查看污点
root@k8s-master:~# kubectl describe nodes k8s-master | grep Taints
Taints:             node-role.kubernetes.io/control-plane:NoSchedule

#node-role.kubernetes.io/control-plane:NoSchedule
只有key : effect

应用场景

隔离专用硬件节点（如 GPU、高性能存储）
节点维护时驱逐业务 Pod
保护敏感数据节点（仅允许特定 Pod 调度）

容忍（Tolerations）

容忍是定义在 Pod 上的属性，允许 Pod 忽略节点污点，从而调度到特定节点。

容忍配置

# pod.spec.
tolerations:
- key: "gpu"               # 匹配污点的 Key
  operator: "Equal"        # 操作符：Equal（精确匹配）或 Exists（存在 Key 即可）
  value: "true"            # 匹配污点的 Value（operator=Equal 时需指定）
  effect: "NoSchedule"     # 匹配污点的 Effect
  tolerationSeconds: 3600  # NoExecute 污点的容忍时长（秒）

关键逻辑

一个 Pod 可定义多个容忍，只需匹配节点上的一个污点即可调度。
operator: Exists时无需指定 value（仅检查 Key 是否存在）。

典型场景

允许 AI 训练任务调度到 GPU 节点
系统组件（如 kube-proxy）容忍 Master 节点污点
维护期临时容忍 NoExecute污点

亲和性（Affinity）

亲和性分为两类，用于引导 Pod 调度到符合规则的节点或与其他 Pod 协同部署。

节点亲和性（Node Affinity）

控制 Pod 与节点的匹配关系：

依赖node的标签

#创建标签  #更新标签（需加 --overwrite）
root@k8s-master:~# kubectl label nodes k8s-node-1 k8s.io/role=node
node/k8s-node-1 labeled

#查看标签
root@k8s-master:~# kubectl get nodes k8s-node-1   --show-labels
NAME         STATUS   ROLES    AGE   VERSION    LABELS
k8s-node-1   Ready    <none>   18d   v1.29.15   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux

#删除标签
root@k8s-master:~# kubectl label nodes k8s-node-1 k8s.io/role-
node/k8s-node-1 unlabeled

硬亲和性（Required）：必须满足的条件

# pod.spec.
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: disktype
          operator: In
          values: [ssd]

软亲和性（Preferred）：优先但不强制

# pod.spec.
affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100  # 优先级权重（0-100）
      preference:
      matchExpressions:
      - key: zone
        operator: In
        values: [us-east]

Pod 亲和性与反亲和性

亲和性（PodAffinity）：将 Pod 调度到同一拓扑域（如相同节点、可用区）
反亲和性（PodAntiAffinity）：避免 Pod 调度到同一拓扑域（提高高可用性）

# pod.spec.affinity.podAffinity
# pod.spec.affinity.podAntiAffinity

podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchLabels:
        app: database
    topologyKey: kubernetes.io/hostname  # 按节点隔离

示例

apiVersion: v1
kind: Pod
metadata:
  name: toleration-test
spec:
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Equal
    value: ''
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values: 
            - k8s-master
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: #["sh", "-c", "sleep 1000"]
    - sh
    - -c
    - sleep 1000



root@k8s-master:~# kubectl get pods -o wide
toleration-test               1/1     Running   0               7s    10.244.0.47   k8s-master   <none>           <none>

HPA

k8s默认水平动态伸缩仅支持通过cpu和内存

想要通过其他指标 API 动态伸缩，需要安装指标采集系统 prometheus
但 Prometheus 采集的指标不兼容kubeapi，
需要一个中间件：Prometheus Adpater

Kubernetes API Server Metrics API <–> Prometheus Adpater <–> Prometheus Metrics API

HPAv1：仅支持cpu、内存
HPAv2：支持使用自定义指标来实现自动扩缩容

安装Metrics-server

K8S 1.29.2 metrics

安装Prometheus Adpater

wsq1203/prom-k8s (github.com)

1	git clone https://github.com/wsq1203/prom-k8s.git

HPA示例

基于内存cpu

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: multi-metrics-hpa  # HPA 名称
  namespace: default       # 命名空间（按需修改）
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment      # 目标对象类型（支持 Deployment/StatefulSet）
    name: your-app        # 目标 Deployment 名称
  minReplicas: 2          # 最小副本数（建议 ≥2 避免单点故障）
  maxReplicas: 10         # 最大副本数（防止资源过度消耗）
  metrics:                # 多指标触发条件
  - type: Resource        # 资源指标类型
    resource:
      name: cpu           # CPU 指标
      target:
        type: Utilization # 使用率模式
        averageUtilization: 70  # CPU 使用率超过 70% 触发扩容
  - type: Resource        # 内存指标
    resource:
      name: memory        # 内存指标
      target:
        type: Utilization
        averageUtilization: 80  # 内存使用率超过 80% 触发扩容
  behavior:               # 扩缩容行为控制（避免抖动）
    scaleDown:            # 缩容策略
      stabilizationWindowSeconds: 300  # 缩容冷却时间（默认 5 分钟）
      policies:
        - type: Percent
          value: 10       # 每次缩容最多减少 10% 的副本
    scaleUp:              # 扩容策略
      stabilizationWindowSeconds: 60   # 扩容冷却时间（默认 1 分钟）
      policies:
        - type: Percent
          value: 100      # 每次扩容最多增加 100% 的副本（快速响应）

基于每秒请求量（通过configmap暴漏指标）

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
  name: metrics-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: metrics-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 5
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 120