Get to Know Google Kubernetes Engine Autopilot
Autopilot is a new mode of operation in Google Kubernetes Engine (GKE) that is designed to reduce the operational cost of managing clusters, optimize your clusters for production, and yield higher workload availability. The mode of operation refers to the level of flexibility, responsibility, and control that you have over your cluster. In addition to the benefits of a fully managed control plane and node automations, GKE offers two modes of operation:
- Autopilot: GKE provisions and manages the cluster’s underlying infrastructure, including nodes and node pools, giving you an optimized cluster with a hands-off experience.
- Standard: You manage the cluster’s underlying infrastructure, giving you node configuration flexibility.
With Autopilot, you no longer have to monitor the health of your nodes or calculate the amount of compute capacity that your workloads require. Autopilot supports most Kubernetes APIs, tools, and its rich ecosystem. You stay within GKE without having to interact with the Compute Engine APIs, CLIs, or UI, as the nodes are not accessible through Compute Engine, like they are in Standard mode. You pay only for the CPU, memory, and storage that your Pods request while they are running.
Autopilot clusters are pre-configured with an optimized cluster configuration that is ready for production workloads. This streamlined configuration follows GKE best practices and recommendations for cluster and workload setup and security. Some of these built-in settings (detailed in the table below) are immutable and other optional settings can be turned on or off.
Autopilot comes with a SLA that covers both the control plane and your Pods. With Autopilot, as the underlying infrastructure is abstracted away, you can focus on the Kubernetes API and your deployments. Autopilot uses the resource requirements that you define in your PodSpec and provisions the resources for the deployment such as CPU, memory, and persistent disks.
There are two main reasons why you might want to use the Standard mode of operation instead of Autopilot:
- You require a higher level of control over your cluster configuration.
- Your clusters must run workloads that do not meet Autopilot constraints.
Comparing Autopilot and Standard modes
With Autopilot, GKE manages many complexities of the lifecycle of your cluster. The following table shows options that are available depending on the mode of operation for the cluster:
- Pre-configured: This setting is built-in and you cannot change it.
- Default: This setting is turned on but you can override it.
- Optional: This setting is turned off but you can turn it on.
Options | Autopilot mode | Standard mode |
---|---|---|
Basic cluster type | Availability and version: Pre-configured: Regional Default: Regular release channel | Availability and version: Optional: Regional or zonalRelease channel or static version |
Nodes and node pools | Managed by GKE. | Managed, configured, and specified by you. |
Provisioning resources | GKE dynamically provisions resources based on your Pod specification. | You manually provision additional resources and set overall cluster size. Configure cluster autoscaling and node auto-provisioning to help automate the process. |
Image type | Pre-configured: Container-Optimized OS with containerd | Choose one of the following: Container-Optimized OS with containerdContainer-Optimized OS with DockerUbuntu with containerdUbuntu with DockerWindows Server LTSCWindows Server SAC |
Billing | Pay per Pod resource requests (CPU, memory, and ephemeral storage) | Pay per node (CPU, memory, boot disk) |
Security | Pre-configured: Workload IdentityShielded nodesSecure bootOptional: Customer-managed encryption keys (CMEK)Application-layer secrets encryptionGoogle Groups for RBAC | Optional: Workload IdentityShielded nodesSecure bootApplication-layer secrets encryptionBinary authorizationCustomer-managed encryption keys (CMEK)Google Groups for RBAC |
Networking | Pre-configured: VPC-native (alias IP)Maximum 32 Pods per nodeIntranode visibilityDefault: Public clusterDefault CIDR ranges Note: Ensure that you review your CIDR ranges to factor in expected cluster growth.Network name/subnetOptional: Private clusterCloud NAT1 (private clusters only)Authorized networks | Optional: VPC-native (alias IP)Maximum 110 Pods per nodeIntranode visibilityCIDR ranges and max cluster sizeNetwork name/subnetPrivate clusterCloud NAT1Network policyAuthorized networks |
Upgrades, repair, and maintenance | Pre-configured: Node auto-repairNode auto-upgradeMaintenance windowsSurge upgrades | Optional: Node auto-repairNode auto-upgradeMaintenance windowsSurge upgrades |
Authentication credentials | Pre-configured: Workload Identity | Optional: Compute Engine service accountWorkload Identity |
Scaling | Pre-configured: Autopilot handles all the scaling and configuring of your nodes. Default:You configure Horizontal pod autoscaling (HPA)You configure Vertical Pod autoscaling (VPA) | Optional: Node auto-provisioningYou configure cluster autoscaling.HPAVPA |
Logging | Pre-configured: System and workload logging | Default: System and workload logging Optional: System-only logging |
Monitoring | Pre-configured: System monitoring Optional: System and workload monitoring | Default: System monitoring Optional: System and workload monitoring |
Routing | Pre-configured: Pod-based routing. Network endpoint groups (NEGs) enabled. | Choose node-based packet routing (default), or Pod-based routing. |
Cluster add-ons | Pre-configured: HTTP load balancingDefault: Compute Engine persistent disk CSI DriverCompute Engine Filestore CSI DriverNodeLocal DNSCacheOptional: Managed Anthos Service Mesh (Preview)Istio (Use Managed Anthos Service Mesh (Preview)) | Optional: Compute Engine persistent disk CSI DriverCompute Engine Filestore CSI DriverHTTP load balancingNodeLocal DNSCacheCloud BuildCloud RunCloud TPUConfig ConnectorManaged Anthos Service MeshKalmUsage metering |
1Further configuration required to enable Cloud NAT on a cluster.
Unsupported cluster features
The following GKE cluster features are not supported for Autopilot clusters:
Compute Engine instances
For GKE versions prior to 1.21.4, the following instance types are not supported in Autopilot clusters:
Security
- Binary authorization
- Kubernetes Alpha APIs
- Legacy authentication options
Storage
Add-ons and integrations
- Calico network policy
- Cloud Build
- Cloud Run
- Cloud TPU
- Config Connector
- Graphics processing units (GPUs)
- Kalm
- Usage metering
Scaling
Autopilot automatically scales the cluster’s resources based on your Pod specifications, so that you can focus on your Pods. To automatically increase or decrease the number of Pods, you can implement Horizontal pod autoscaling using the standard Kubernetes CPU or memory metrics, or using custom metrics through Cloud Monitoring.
Allowable resource ranges
The following table lists the allowable resource ranges for Autopilot Pods. All values apply to the sum of all container resource requests in the Pod, unless noted. Pod vCPU are available in increments of 0.25 vCPU. In addition to the minimum values, the CPU:memory ratio must be in the range of 1 vCPU:1 GiB to 1 vCPU:6.5 GiB. Resources outside of the allowable ratio ranges will be scaled up. For more information, see resource ranges and ratios management and resource limitation examples.
Resource | Minimum resources | Maximum resources | |
---|---|---|---|
Normal Pods | DaemonSet Pods | Normal and DaemonSet Pods | |
CPU | 250 mCPU | 10 mCPU | 28 vCPU2 |
Memory | 512 MiB | 10 MiB | 80 GiB2 |
Ephemeral storage | 10 MiB (per container) | 10 MiB (per container) | 10 GiB |
2The maximum CPU and memory limits for normal Pods are further reduced by the sum total of the resource requests of all DaemonSet Pods.
Default container resource requests
Autopilot relies on what you specify in your deployment configuration to provision resources. If you do not specify resource requests for any container in the Pod, Autopilot applies default values. These defaults are designed to give the containers in your Pods an average amount of resources, which are suitable for many smaller workloads.Important: Google recommends that you explicitly set your resource requests for each container to meet your application requirements, as these default values might not be sufficient, or optimal.
Autopilot applies these default values to resources that are not defined in the Pod specification.
Resource | Containers in normal Pods | Containers in DaemonSets |
---|---|---|
CPU | 500 mCPU | 50 mCPU |
Memory | 2 GiB | 100 MiB |
Ephemeral storage | 1 GiB | 100 MiB |
For more information about Autopilot cluster limits, see Quotas and limits.
Workload limitations and restrictions
Autopilot supports most workloads that run your applications. In order for GKE to offer management of the nodes and provide you with a more streamlined operational experience, there are a few restrictions and limitations when compared to GKE Standard. Some of these limitations are security best practices, while others allow Autopilot clusters to be safely managed. Workload limitations apply to all Pods, including those launched by Deployments, DaemonSets, ReplicaSets, ReplicationControllers, StatefulSets, Jobs, and CronJobs.
Host options restrictions
HostPort and hostNetwork are not permitted because node management is handled by GKE. Using hostPath volumes in write mode is prohibited, while using hostPath volumes in read mode is allowed only for /var/log/
path prefixes. Using host namespaces in workloads is prohibited.
Linux workload limitations
Autopilot supports only the following Linux capabilities for workloads:
"SETPCAP", "MKNOD", "AUDIT_WRITE", "CHOWN", "DAC_OVERRIDE", "FOWNER",
"FSETID", "KILL", "SETGID", "SETUID", "NET_BIND_SERVICE", "SYS_CHROOT", "SETFCAP"
In GKE version 1.21 and later, the "SYS_PTRACE"
capability is also supported for workloads.
Node selectors and node affinity
Zonal affinity topologies are supported. Node affinity and node selectors are limited for use only with the following keys: topology.kubernetes.io/region
, topology.kubernetes.io/zone
, failure-domain.beta.kubernetes.io/region
, failure-domain.beta.kubernetes.io/zone
, cloud.google.com/gke-os-distribution
, kubernetes.io/os
, and kubernetes.io/arch
. Not all values of OS and arch are supported in Autopilot.
Node selectors and node affinities also support the cloud.google.com/gke-spot
key to automatically provision Spot Pods in clusters running GKE version 1.21.4 and later.
No Container Threat Detection
Autopilot does not support Container Threat Detection.
No privileged Pods
Privileged mode for containers in workloads is mainly used to make changes to nodes, like changing kubelet or networking settings. With Autopilot clusters, node changes aren’t allowed, so these types of Pods are also not allowed. This restriction might impact some admin workloads.
Pod affinity and anti-affinity
Although GKE manages your nodes for you in Autopilot, you retain the ability to schedule your Pods. Autopilot supports Pod affinity , so that you can co-locate Pods together on a single node for network efficiency. For example, you can use Pod affinity to deploy frontend Pods on nodes with backend Pods. Pod affinity is limited for use only with the following keys: topology.kubernetes.io/region
, topology.kubernetes.io/zone
, failure-domain.beta.kubernetes.io/region
, and failure-domain.beta.kubernetes.io/zone
.
Autopilot also supports anti-affinity, so that you can spread Pods across nodes to avoid single points of failure. For example, you can use Pod anti-affinity to prevent frontend Pods from co-locating with backend Pods.
Defaults and resource limitations when using Pod anti-affinity
Autopilot supports Pod anti-affinity, so that you can prevent two Pods from co-locating on the same node. When using anti-affinity, Autopilot must allocate additional compute resources to ensure proper Pod separation, as defined by the PodSpec. When using Pod anti-affinity, the defaults and minimum resource limits increase. For all containers listed in the PodSpec:
Resource | Default value |
---|---|
CPU | 0.75 vCPU |
Memory | 2 GiB |
Ephemeral Storage | 1 GiB |
When using Pod anti-affinity, the same resource limitation rules and logic apply, but with higher vCPU increments. Pod vCPU are offered in a minimum of 0.5 vCPU and increments of 0.5 vCPU (rounded up to the nearest increment). For example, if you request 0.66 vCPU total (among all your containers using anti-affinity), your PodSpec is modified during admission and set to 1 vCPU. Your Pod has full access to the higher resource, with the extra resource divided among the resource requests of all containers.
Tolerations supported only for workload separation
Tolerations are supported only for workload separation. Taints are automatically added by node auto-provisioning as needed.
Resource ranges and ratio management
- Pod vCPU increments: Pod vCPU are available in increments of 0.25 vCPU (rounded up). For example, if you request 0.66 vCPU total (among all your containers), your PodSpec is modified during admission and set to 0.75. Your Pod has full access to the higher resource, with the extra resource divided among the resource requests of all containers. The minimum value is 250 milliCPU (mCPU). DaemonSet vCPU are offered in increments of 10 mCPU (rounded up to the nearest increment).
- Memory to CPU ratio range: The ratio of memory (in GiB) to vCPU must be in the range 1 to 6.5 vCPU. For example, you can have a Pod with 1 vCPU and 1 GiB of memory, or 1 vCPU and 6.5 GiB memory, but not 1 vCPU and 7 GiB memory. In order to deliver the resources request, GKE scales up whichever resource is too low. For example, if you request 1 vCPU and 7 GiB memory, your PodSpec is modified to 1.25 vCPU and 7 GiB memory on admission. Similarly, if you request 1 vCPU and 800 MiB memory, your PodSpec is modified to 1 vCPU and 1 GiB RAM, with the additional resource divided among the containers.
The CPU and memory increment and ratio requirements and potential scale-up of resource requests is calculated after the defaults are applied to containers with missing resource requests.
Containers with no resource requests will default to the standard minimums of 500 mCPU and 1 GiB memory. For CPU and memory, when GKE scales a resource request up (for example, to meet the minimum requirement, or the ratio requirement), the additional resource is allocated evenly between containers. Rounded up values are distributed proportionally across containers. For example, a container that has twice as much memory than the other containers will receive twice as much additional memory.
- Ephemeral storage: The available range is between 10 MiB and 10 GiB. This impacts the container writable layer and emptyDir mounts.
Ephemeral storage has a minimum request per container, so if the ephemeral storage requests for a container are less than the minimum, Autopilot increases the request to the minimum. Ephemeral storage does not have a minimum request per Pod. Ephemeral storage has a maximum request per Pod which is cumulative across all containers. If the cumulative value is more than the maximum, Autopilot scales the request back to the maximum while also ensuring that the ratio of requests between containers remains the same.
Resource limitation examples
Example 1: For a single container with < 250 mCPU minimum:
Container number | Original resource requests | Modified requests |
---|---|---|
1 | CPU: 180 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 250 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
Total Pod resources | CPU: 250 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
Example 2: For multiple containers with a total of < 250 mCPU minimum, Autopilot distributes the remainder of the resources (up to 250 vCPU) evenly between all containers in the Pod.
Container number | Original resource requests | Modified requests |
---|---|---|
1 | CPU: 70 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 84 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
2 | CPU: 70 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 83 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
3 | CPU: 70 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 83 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
Total Pod resources | CPU: 250 mCPU Memory: 1.5 GiB Ephemeral storage: 30 MiB |
Example 3: For multiple containers with total resources >= 250 mCPU, the CPU is rounded to multiples of 250 mCPU and the extra CPU is spread across all containers in the ratio of their original requests. In this example, the original cumulative CPU is 320 mCPU and is modified to a total of 500 mCPU. The extra 180 mCPU is spread across the containers:
Container number | Original resource requests | Modified requests |
---|---|---|
1 | CPU: 170 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 266 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
2 | CPU: 80 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 125 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
3 | CPU: 70 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 109 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
4 | Init container, resources not defined | Will receive Pod resources |
Total Pod resources | CPU: 500 mCPU Memory: 1.5 GiB Ephemeral storage: 30 MiB |
Example 4: For a single container where the CPU is too low for the amount of memory (1 vCPU:6.5 GiB maximum). The maximum allowed ratio for CPU to memory is 1:6.5. If the ratio is higher than that, the CPU request is increased and then rounded up if necessary:
Container number | Original resource requests | Modified requests |
---|---|---|
1 | CPU: 250 mCPU Memory: 4 GiB Ephemeral storage: 10 MiB | CPU: 750 mCPU Memory: 4 GiB Ephemeral storage: 10 MiB |
Total Pod resources | CPU: 750 mCPU Memory: 4 GiB Ephemeral storage: 10 MiB |
Example 5: For a single container where the memory is too low for the amount of CPU (1 vCPU:1 GiB minimum). The minimum allowed ratio for CPU to memory is 1:1. If the ratio is lower than that, the memory request is increased.
Container number | Original resource requests | Modified requests |
---|---|---|
1 | CPU: 4 vCPU Memory: 1 GiB Ephemeral storage: 10 MiB | CPU: 4 vCPU Memory: 4 GiB Ephemeral storage: 10 MiB |
Total Pod resources | CPU: 4 mCPU Memory: 4 GiB Ephemeral storage: 10 MiB |
Example 6: For a single container with < 250 mCPU minimum, where after adjusting, the CPU is too low for the amount of memory (1 vCPU:6.5 GiB maximum).
Container number | Original resource requests | Modified requests |
---|---|---|
1 | CPU: 100 mCPU Memory: 50 MiB Ephemeral storage: 10 MiB | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 10 MiB |
Total Pod resources | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 10 MiB |
Example 7: For a single container with ephemeral storage requests > 10 GiB, the maximum allowed ephemeral storage request is 10 GiB. If the request is greater than the maximum value, the request is downscaled to 10 GiB.
Container number | Original resource requests | Modified requests |
---|---|---|
1 | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 11 GiB | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 10 GiB |
Total Pod resources | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 10 GiB |
Example 8: For multiple containers with ephemeral storage requests > 10 GiB, all container ephemeral storage requests are downscaled to make the final cumulative storage request of 10 GiB.
Container number | Original resource requests | Modified requests |
---|---|---|
1 | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 5 GiB | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 2.94 GiB |
2 | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 6 GiB | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 3.53 GiB |
3 | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 6 GiB | CPU: 250 mCPU Memory: 256 MiB Ephemeral storage: 3.53 GiB |
Total Pod resources | CPU: 750 mCPU Memory: 768 MiB Ephemeral storage: 10 GiB |
Security limitations
Container isolation
Autopilot enforces a hardened configuration for your Pods that provides enhanced security isolation and helps limit the impact of container escape vulnerabilities on your cluster:
- The container runtime default seccomp profile is applied, by default, to all Pods in your cluster.
- The
CAP_NET_RAW
container permission is dropped for all containers. TheCAP_NET_RAW
permission is not typically used and was the subject of multiple container escape vulnerabilities. The lack ofCAP_NET_RAW
might cause the use ofping
to fail inside your container. - Workload Identity is enforced and prevents Pod access to the underlying Compute Engine service account and other sensitive node metadata.
- Services with spec.ExternalIPs set are blocked to protect against CVE-2020-8554. These services are rarely used.
- The following StorageTypes are allowed. Other StorageTypes are blocked because they require privileges over the node:
"configMap", "csi", "downwardAPI", "emptyDir", "gcePersistentDisk", "hostPath", "nfs", "persistentVolumeClaim", "projected", "secret"
Pod security policies
Autopilot enforces settings that provide enhanced isolation for your containers. Kubernetes PodSecurityPolicy is not supported on Autopilot clusters. In GKE versions older than 1.21, OPA Gatekeeper and Policy Controller are also not supported.
Security boundaries in Autopilot
At the Kubernetes layer, the GKE Autopilot mode provides the Kubernetes API but removes permissions to use some highly privileged Kubernetes primitives, like privileged Pods, with the goal to limit the ability to access, modify, or directly control the node virtual machine (VM).
These restrictions are put in place for GKE Autopilot mode to limit workloads from having low-level access to the node VM, in order to allow Google Cloud to offer full management of nodes, and a Pod-level SLA.Important: The security boundary for GKE nodes is the single-tenant virtual machine, and as such the ability to access the node VM from pods is not considered a security boundary for Autopilot. Use of any node-level access is inconsistent with the features of GKE Autopilot, is not currently supported, and may be removed without notice. If you require node VM level access, consider using GKE Standard.
Our intent is to prevent unintended access to the node virtual machine. We accept submissions to that effect through the Google Vulnerability Reward Program (VRP) and will reward reports at the discretion of the Google VRP reward panel.
By design, privileged users, like cluster administrators, have full control of any GKE cluster. As a security best practice, we recommend that you avoid granting powerful GKE/Kubernetes privileges widely and instead use namespace admin delegation wherever possible as described in our multi-tenancy guidance.
Workloads on Autopilot continue to enjoy the same security as GKE Standard mode, where single-tenant VMs are provisioned in the user’s project for their exclusive use. And, like Standard, on each individual VM, Autopilot workloads within a cluster might run together on a VM with a kernel that is security-hardened, but shared.
Since the shared kernel represents a single security boundary, GKE recommends that if you require strong isolation, such as high-risk or untrusted workloads, run your workloads on GKE Standard clusters using GKE Sandbox to provide multi-layer security protection.
Other limitations
Certificate signing requests
You cannot create certificate signing requests within Autopilot.
External monitoring tools
Most external monitoring tools require access that is restricted. Solutions from several Google Cloud partners are available for use on Autopilot, however not all are supported, and custom monitoring tools cannot be installed on Autopilot clusters.
External services
External IP Services are not permitted on Autopilot clusters. To give a Service an external IP, you can use a LoadBalancer type of Service or use an Ingress to add the Service to an external IP shared among several services.
Init containers
Init containers run in serial before the application containers start. By default, GKE allocates the full resources available to the Pod to each init container.
Unlike your other containers, GKE recommends leaving the resource requests unspecified for init containers, so that the containers have the full resources. If you set lower resources, your init container is constrained unnecessarily, and if you set higher resources, then you might increase your bill for the lifetime of the Pod.
Managed namespaces
The kube-system
namespace is managed, meaning that all resources in this namespace cannot be altered and new resources cannot be created.
No changes to nodes
Since GKE manages the nodes for you for Autopilot clusters, you cannot alter the nodes.
No conversion
Converting Standard clusters to Autopilot mode and converting Autopilot clusters to Standard mode is not supported.
No direct external inbound connections for private clusters
Autopilot clusters with private nodes do not have external IPs and cannot accept inbound connections directly. If you deploy services on a NodePort, you cannot access those services from outside the VPC, such as from the internet. To expose applications externally in Autopilot clusters, use Services. For more information, see Exposing applications using services.
No Pod bursting
For Standard clusters, Pods can be configured to burst into unused capacity on the node. For Autopilot clusters, since all Pods have limits set on requests, resource bursting is not possible. It is important to ensure that your Pod specification defines adequate resources for the resource requests, and does not rely on bursting.
No SSH
Since you’re no longer provisioning or managing the nodes in Autopilot, there’s no SSH access. GKE handles all operational aspects of the nodes, including node health and all Kubernetes components running on the nodes.
Resource limits
In an Autopilot cluster, each Pod is treated as a Guaranteed QoS Class Pod, with limits that are equal to requests. Autopilot automatically sets resource limits equal to requests if you do not have resource limits specified. If you do specify resource limits, your limits will be overridden and set to be equal to the requests.
Webhooks limitations
In GKE version 1.21 and later, you can also create mutating dynamic admission webhooks. However, Autopilot modifies mutating webhooks objects to add a namespace selector which excludes the resources in managed namespaces (e.g. kube-system
) from being intercepted. Additionally, webhooks which specify one or more of following resources (and any of their sub-resources) in the rules, will be rejected:
- group: ""
resource: nodes
- group: ""
resource: persistentvolumes
- group: certificates.k8s.io
resource: certificatesigningrequests
- group: authentication.k8s.io
resource: tokenreviews
Troubleshooting
Cannot create a cluster: 0 nodes registered
When you create an Autopilot cluster, it fails with the following error:
All cluster resources were brought up, but: only 0 nodes out of 2 have registered.
To resolve the issue, ensure that the default Compute Engine service account is not disabled. Run the following command to check:
gcloud iam service-accounts describe SERVICE_ACCOUNT
Replace SERVICE_ACCOUNT
with the numeric service account ID or service account email address (like 123456789876543212345
or my-iam-account@somedomain.com
).
Nodes fail to scale up
After creating an Autopilot cluster, the logs show the following message:
"napFailureReasons": [
{
"messageId": "no.scale.up.nap.pod.zonal.resources.exceeded",
...
This error refers to a noScaleUp
event, where node auto-provisioning did not provision any node group for the Pod in the zone because doing so would violate resource limits.
If you encounter this error, confirm the following:
- Your cluster has sufficient memory and CPU.
- The Pod address CIDR range is large enough to support your anticipated maximum cluster size.
Pricing
One autopilot cluster or zonal cluster per billing account is free.
Cluster management fee of $0.10 per cluster/hour apply, except for Anthos clusters. User pods in autopilot clusters are billed per second for CPU cores, memory, and ephemeral storage, until a pod is deleted. Worker nodes in standard clusters accrue compute costs, until a cluster is deleted.
Take the next step
Start building on Google Cloud with $500 in free credits and 20+ always free products.