
1 什么是超容量扩容





2 什么情况下需要使用超容量扩容



• 30秒 – 目标指标值更新:30-60秒

• 30秒 – HPA检查指标值:30秒 – >30秒 – HPA检查指标值:30秒 – >

• <2秒 – Pods创建之后进入pending状态<2秒 -Pods创建之后进入pending状态

• <2秒 – CA看到pending状态的pods,之后调用来创建node 1秒<2秒 -CA看到pending状态的pods,之后调用来创建node 1秒

• 3分钟 – cloud provider创建工作节点,之后加入k8s之后等待node变成ready

最糟糕的情况 – 12分钟

• 60 秒 —目标指标值更新

• 30 秒 — HPA检查指标值

• < 2 秒 — Pods创建之后进入pending状态

• < 2 秒 —CA看到pending状态的pods,之后调用来创建node 1秒

• 10 分钟 — cloud provider创建工作节点,之后加入k8s之后等待node变成ready


  1. 大促备战

  2. 流计算/实时计算

  3. Devops系统

  4. 其他调度频繁的业务场景

3 如何开启超容量扩容


3.1 开启ClusterAutoscaler


• 进入 “kubernetes容器服务”->“工作节点组”

• 选择需要对应节点组,点击开启自动伸缩

• 设置节点数量区间,并点击确定

3.2 部署OverprovisionAutoscaler1 部署控制器及配置

apiVersion: apps/v1kind: Deploymentmetadata:  name: overprovisioning-autoscaler  namespace: default  labels:    app: overprovisioning-autoscaler    owner: cluster-autoscaler-overprovisioningspec:  selector:    matchLabels:      app: overprovisioning-autoscaler      owner: cluster-autoscaler-overprovisioning  replicas: 1  template:    metadata:      labels:        app: overprovisioning-autoscaler        owner: cluster-autoscaler-overprovisioning    spec:      serviceAccountName: cluster-proportional-autoscaler      containers:        - image: jdcloud-cn-north-1.jcr.service.jdcloud.com/k8s/cluster-proportional-autoscaler:v1.16.3          name: proportional-autoscaler          command:            - /autoscaler            - --namespace=default            ## 注意这里需要根据需要指定上述的configmap的名称             ## /overprovisioning-autoscaler-ladder/overprovisioning-autoscaler-linear            - --configmap=overprovisioning-autoscaler-{provision-mode}            ## 预热集群应用(类型)/ 名称,基准应用和空值应用需要在同一个命名空间下            - --target=deployment/overprovisioning            - --logtostderr=true            - --v=2          imagePullPolicy: IfNotPresent          volumeMounts:            - name: host-time              mountPath: /etc/localtime      volumes:        - name: host-time          hostPath:            path: /etc/localtime---kind: ServiceAccountapiVersion: v1metadata:  name: cluster-proportional-autoscaler  namespace: default---kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata:  name: cluster-proportional-autoscalerrules:  - apiGroups: [""]    resources: ["nodes"]    verbs: ["list", "watch"]  - apiGroups: [""]    resources: ["replicationcontrollers/scale"]    verbs: ["get", "update"]  - apiGroups: ["extensions","apps"]    resources: ["deployments/scale", "replicasets/scale","deployments","replicasets"]    verbs: ["get", "update"]  - apiGroups: [""]    resources: ["configmaps"]    verbs: ["get", "create"]---kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata:  name: cluster-proportional-autoscalersubjects:  - kind: ServiceAccount    name: cluster-proportional-autoscaler    namespace: defaultroleRef:  kind: ClusterRole  name: cluster-proportional-autoscaler  apiGroup: rbac.authorization.k8s.io---apiVersion: scheduling.k8s.io/v1kind: PriorityClassmetadata:  name: overprovisioningvalue: -1globalDefault: falsedescription: "Priority class used by overprovisioning."

2 部署空载应用

apiVersion: apps/v1kind: Deploymentmetadata:  name: overprovisioning  namespace: default  labels:    app: overprovisioning    owner: cluster-autoscaler-overprovisioningspec:  replicas: 1  selector:    matchLabels:      app: overprovisioning      owner: cluster-autoscaler-overprovisioning  template:    metadata:      annotations:        autoscaler.jke.jdcloud.com/overprovisioning: "reserve-pod"      labels:        app: overprovisioning        owner: cluster-autoscaler-overprovisioning    spec:      priorityClassName: overprovisioning      containers:        - name: reserve-resources          image: jdcloud-cn-east-2.jcr.service.jdcloud.com/k8s/pause-amd64:3.1          resources:            requests:              ## 根据预热预期设置配置的分片数量及单分片所需资源              cpu: 7          imagePullPolicy: IfNotPresent

3.3 验证超容量扩容功能是否正常1 验证Autoscaler

• 查看autoscaler控制器是否Running

• 不断创建测试应用,应用需求资源略微小于节点组单节点可调度资源

• 观察集群节点状态,当资源不足导致pod 等待中状态时,autocalser是否会按照预设(扩容等待、扩容冷却、最大节点数量等)进行扩容

• 开启集群自动缩容,删除测试应用,观察集群节点资源Request到达阈值后是否发生缩容。

2 验证OverprovisionAutoscaler

• 查看OverprovisionAutoscaler控制器是否Running

• 不断创建测试应用,当发生autoscaler后,空载应用数量是否会根据配置发生变化

• 当业务应用pendding后,空载应用是否会发生驱逐,并调度业务应用

4 设置OverprovisionAutoscaler及ClusterAutoscaler参数4.1 配置ClusterAutoscaler1 ca参数说明

scan_interval20sHow often cluster is reevaluated for scale up or down
max_nodes_total0Maximum number of nodes in all node groups
estimatorbinpackingType of resource estimator to be used in scale up.
expanderleast-wasteType of node group expander to be used in scale up
max_empty_bulk_delete15Maximum number of empty nodes that can be deleted at the same time
max_graceful_termination_sec600Maximum number of seconds CA waits for pod termination when trying to scale down a node
max_total_unready_percentage45Maximum percentage of unready nodes in the cluster. After this is exceeded, CA halts operations
ok_total_unready_count100Number of allowed unready nodes, irrespective of max-total-unready-percentage
max_node_provision_time900sMaximum time CA waits for node to be provisioned
scale_down_enabledtrueShould CA scale down the cluster
scale_down_delay_after_add600sHow long after scale up that scale down evaluation resumes
scale_down_delay_after_delete10sHow long after node deletion that scale down evaluation resumes, defaults to scanInterval
scale_down_delay_after_failure180sHow long after scale down failure that scale down evaluation resumes
scale_down_unneeded_time600sHow long a node should be unneeded before it is eligible for scale down
scale_down_unready_time1200sHow long an unready node should be unneeded before it is eligible for scale down
scale_down_utilization_threshold0.5Node utilization level, defined as sum of requested resources divided by capacity, below which a node can be considered for scale down
balance_similar_node_groupsfalseDetect similar node groups and balance the number of nodes between them
node_autoprovisioning_enabledfalseShould CA autoprovision node groups when needed
max_autoprovisioned_node_group_count15The maximum number of autoprovisioned groups in the cluster
skip_nodes_with_system_podstrueIf true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods)
skip_nodes_with_local_storagetrueIf true cluster autoscaler will never delete nodes with pods with local storage, e.g. EmptyDir or HostPath’, NOW(), NOW(), 1);

2 推荐配置

# 其他保持默认scan_interval=10smax_node_provision_time=180sscale_down_delay_after_add=180sscale_down_delay_after_delete=180sscale_down_unneeded_time=300sscale_down_utilization_threshold=0.4

4.2 配置OverprovisionAutoscaler


1 线性配置(ladder)





kind: ConfigMapapiVersion: v1metadata:  name: overprovisioning-autoscaler-linear  namespace: defaultdata:  linear: |-    {      "coresPerReplica": 2,      "nodesPerReplica": 1,      "min": 1,      "max": 100,      "includeUnschedulableNodes": false,      "preventSinglePointFailure": true    }

2 阶梯配置(linear)


kind: ConfigMapapiVersion: v1metadata:  name: overprovisioning-autoscaler-ladder  namespace: defaultdata:  ladder: |-    {      "coresToReplicas":      [        [ 1,1 ],        [ 50,3 ],        [ 200,5 ],        [ 500,7 ]      ],      "nodesToReplicas":      [        [ 1,1 ],        [ 3,4 ],        [ 10,5 ],        [ 50,20 ],        [ 100,120 ],        [ 150,120 ]      ]    }