Considerations for large clusters (2024)

A cluster is a set of nodes (physicalor virtual machines) running Kubernetes agents, managed by thecontrol plane.Kubernetes v1.31 supports clusters with up to 5,000 nodes. More specifically,Kubernetes is designed to accommodate configurations that meet all of the following criteria:

No more than 110 pods per node
No more than 5,000 nodes
No more than 150,000 total pods
No more than 300,000 total containers

You can scale your cluster by adding or removing nodes. The way you do this dependson how your cluster is deployed.

Cloud provider resource quotas

To avoid running into cloud provider quota issues, when creating a cluster with many nodes,consider:

Requesting a quota increase for cloud resources such as:
- Computer instances
- CPUs
- Storage volumes
- In-use IP addresses
- Packet filtering rule sets
- Number of load balancers
- Network subnets
- Log streams
Gating the cluster scaling actions to bring up new nodes in batches, with a pausebetween batches, because some cloud providers rate limit the creation of new instances.

Control plane components

For a large cluster, you need a control plane with sufficient compute and otherresources.

Typically you would run one or two control plane instances per failure zone,scaling those instances vertically first and then scaling horizontally after reachingthe point of falling returns to (vertical) scale.

You should run at least one instance per failure zone to provide fault-tolerance. Kubernetesnodes do not automatically steer traffic towards control-plane endpoints that are in thesame failure zone; however, your cloud provider might have its own mechanisms to do this.

etcd storage

To improve performance of large clusters, you can store Event objects in a separatededicated etcd instance.

When creating a cluster, you can (using custom tooling):

start and configure additional etcd instance
configure the API server to use it for storing events

See Operating etcd clusters for Kubernetes andSet up a High Availability etcd cluster with kubeadmfor details on configuring and managing etcd for a large cluster.

Addon resources

Kubernetes resource limitshelp to minimize the impact of memory leaks and other ways that pods and containers canimpact on other components. These resource limits apply toaddon resources just as they apply to application workloads.

For example, you can set CPU and memory limits for a logging component:

 ... containers: - name: fluentd-cloud-logging image: fluent/fluentd-kubernetes-daemonset:v1 resources: limits: cpu: 100m memory: 200Mi

What's next

VerticalPodAutoscaler is a custom resource that you can deploy into your clusterto help you manage resource requests and limits for pods.
Learn more about Vertical Pod Autoscalerand how you can use it to scale clustercomponents, including cluster-critical addons.
Read about cluster autoscaling
The addon resizerhelps you in resizing the addons automatically as your cluster's scale changes.

Feedback

Was this page helpful?

Thanks for the feedback. If you have a specific, answerable question about how to use Kubernetes, ask it onStack Overflow.Open an issue in the GitHub Repository if you want toreport a problemorsuggest an improvement.

Last modified June 27, 2024 at 8:31 AM PST: Update cluster-large.md (27ac207f0f)

Edit this page Create child page Create documentation issue Print entire section