<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Kubernetes Blog</title>
    <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/</link>
    <description>The Kubernetes blog is used by the project to communicate new features, community reports, and any news that might be relevant to the Kubernetes community.</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <image>
      <url>https://raw.githubusercontent.com/kubernetes/kubernetes/master/logo/logo.png</url>
      <title>The Kubernetes project logo</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/</link>
    </image>
    
    <atom:link href="https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/feed.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Kubernetes v1.34: VolumeAttributesClass for Volume Modification GA</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/09/01/kubernetes-v1-34-volume-attributes-class/</link>
      <pubDate>Mon, 01 Sep 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/09/01/kubernetes-v1-34-volume-attributes-class/</guid>
      <description>
        
        
        &lt;p&gt;The VolumeAttributesClass API, which empowers users to dynamically modify volume attributes, has officially graduated to General Availability (GA) in Kubernetes v1.34. This marks a significant milestone, providing a robust and stable way to tune your persistent storage directly within Kubernetes.&lt;/p&gt;
&lt;h2 id=&#34;what-is-volumeattributesclass&#34;&gt;What is VolumeAttributesClass?&lt;/h2&gt;
&lt;p&gt;At its core, VolumeAttributesClass is a cluster-scoped resource that defines a set of mutable parameters for a volume. Think of it as a &amp;quot;profile&amp;quot; for your storage, allowing cluster administrators to expose different quality-of-service (QoS) levels or performance tiers.&lt;/p&gt;
&lt;p&gt;Users can then specify a &lt;code&gt;volumeAttributesClassName&lt;/code&gt; in their PersistentVolumeClaim (PVC) to indicate which class of attributes they desire. The magic happens through the Container Storage Interface (CSI): when a PVC referencing a VolumeAttributesClass is updated, the associated CSI driver interacts with the underlying storage system to apply the specified changes to the volume.&lt;/p&gt;
&lt;p&gt;This means you can now:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dynamically scale performance: Increase IOPS or throughput for a busy database, or reduce it for a less critical application.&lt;/li&gt;
&lt;li&gt;Optimize costs: Adjust attributes on the fly to match your current needs, avoiding over-provisioning.&lt;/li&gt;
&lt;li&gt;Simplify operations: Manage volume modifications directly within the Kubernetes API, rather than relying on external tools or manual processes.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;what-is-new-from-beta-to-ga&#34;&gt;What is new from Beta to GA&lt;/h2&gt;
&lt;p&gt;There are two major enhancements from beta.&lt;/p&gt;
&lt;h3 id=&#34;cancel-support-from-infeasible-errors&#34;&gt;Cancel support from infeasible errors&lt;/h3&gt;
&lt;p&gt;To improve resilience and user experience, the GA release introduces explicit cancel support when a requested volume modification becomes infeasible. If the underlying storage system or CSI driver indicates that the requested changes cannot be applied (e.g., due to invalid arguments), users can cancel the operation and revert the volume to its previous stable configuration, preventing the volume from being left in an inconsistent state.&lt;/p&gt;
&lt;h3 id=&#34;quota-support-based-on-scope&#34;&gt;Quota support based on scope&lt;/h3&gt;
&lt;p&gt;While VolumeAttributesClass doesn&#39;t add a new quota type, the Kubernetes control plane can be configured to enforce quotas on PersistentVolumeClaims that reference a specific VolumeAttributesClass.&lt;/p&gt;
&lt;p&gt;This is achieved by using the &lt;code&gt;scopeSelector&lt;/code&gt; field in a ResourceQuota to target PVCs that have &lt;code&gt;.spec.volumeAttributesClassName&lt;/code&gt; set to a particular VolumeAttributesClass name. Please see more details &lt;a href=&#34;https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-per-volumeattributesclass&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;drivers-support-volumeattributesclass&#34;&gt;Drivers support VolumeAttributesClass&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Amazon EBS CSI Driver: The AWS EBS CSI driver has robust support for VolumeAttributesClass and allows you to modify parameters like volume type (e.g., gp2 to gp3, io1 to io2), IOPS, and throughput of EBS volumes dynamically.&lt;/li&gt;
&lt;li&gt;Google Compute Engine (GCE) Persistent Disk CSI Driver (pd.csi.storage.gke.io): This driver also supports dynamic modification of persistent disk attributes, including IOPS and throughput, via VolumeAttributesClass.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;contact&#34;&gt;Contact&lt;/h2&gt;
&lt;p&gt;For any inquiries or specific questions related to VolumeAttributesClass, please reach out to the &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;SIG Storage community&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Tuning Linux Swap for Kubernetes: A Deep Dive</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/</link>
      <pubDate>Tue, 19 Aug 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/</guid>
      <description>
        
        
        &lt;p&gt;The Kubernetes &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/cluster-administration/swap-memory-management/&#34;&gt;NodeSwap feature&lt;/a&gt;, likely to graduate to &lt;em&gt;stable&lt;/em&gt; in the upcoming Kubernetes v1.34 release,
allows swap usage:
a significant shift from the conventional practice of disabling swap for performance predictability.
This article focuses exclusively on tuning swap on Linux nodes, where this feature is available. By allowing Linux nodes to use secondary storage for additional virtual memory when physical RAM is exhausted, node swap support aims to improve resource utilization and reduce out-of-memory (OOM) kills.&lt;/p&gt;
&lt;p&gt;However, enabling swap is not a &amp;quot;turn-key&amp;quot; solution. The performance and stability of your nodes under memory pressure are critically dependent on a set of Linux kernel parameters. Misconfiguration can lead to performance degradation and interfere with Kubelet&#39;s eviction logic.&lt;/p&gt;
&lt;p&gt;In this blogpost, I&#39;ll dive into critical Linux kernel parameters that govern swap behavior. I will explore how these parameters influence Kubernetes workload performance, swap utilization, and crucial eviction mechanisms.
I will present various test results showcasing the impact of different configurations, and share my findings on achieving optimal settings for stable and high-performing Kubernetes clusters.&lt;/p&gt;
&lt;h2 id=&#34;introduction-to-linux-swap&#34;&gt;Introduction to Linux swap&lt;/h2&gt;
&lt;p&gt;At a high level, the Linux kernel manages memory through pages, typically 4KiB in size. When physical memory becomes constrained, the kernel&#39;s page replacement algorithm decides which pages to move to swap space. While the exact logic is a sophisticated optimization, this decision-making process is influenced by certain key factors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Page access patterns (how recently pages are accessed)&lt;/li&gt;
&lt;li&gt;Page dirtyness (whether pages have been modified)&lt;/li&gt;
&lt;li&gt;Memory pressure (how urgently the system needs free memory)&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;anonymous-vs-file-backed-memory&#34;&gt;Anonymous vs File-backed memory&lt;/h3&gt;
&lt;p&gt;It is important to understand that not all memory pages are the same. The kernel distinguishes between anonymous and file-backed memory.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Anonymous memory&lt;/strong&gt;: This is memory that is not backed by a specific file on the disk, such as a program&#39;s heap and stack. From the application&#39;s perspective this is private memory, and when the kernel needs to reclaim these pages, it must write them to a dedicated swap device.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;File-backed memory&lt;/strong&gt;: This memory is backed by a file on a filesystem. This includes a program&#39;s executable code, shared libraries, and filesystem caches. When the kernel needs to reclaim these pages, it can simply discard them if they have not been modified (&amp;quot;clean&amp;quot;). If a page has been modified (&amp;quot;dirty&amp;quot;), the kernel must first write the changes back to the file before it can be discarded.&lt;/p&gt;
&lt;p&gt;While a system without swap can still reclaim clean file-backed pages memory under pressure by dropping them, it has no way to offload anonymous memory. Enabling swap provides this capability, allowing the kernel to move less-frequently accessed memory pages to disk to conserve memory to avoid system OOM kills.&lt;/p&gt;
&lt;h3 id=&#34;key-kernel-parameters-for-swap-tuning&#34;&gt;Key kernel parameters for swap tuning&lt;/h3&gt;
&lt;p&gt;To effectively tune swap behavior, Linux provides several kernel parameters that can be managed via &lt;code&gt;sysctl&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;vm.swappiness&lt;/code&gt;: This is the most well-known parameter. It is a value from 0 to 200 (100 in older kernels) that controls the kernel&#39;s preference for swapping anonymous memory pages versus reclaiming file-backed memory pages (page cache).
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;High value (eg: 90+)&lt;/strong&gt;: The kernel will be aggressive in swapping out less-used anonymous memory to make room for file-cache.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low value (eg: &amp;lt; 10)&lt;/strong&gt;: The kernel will strongly prefer dropping file cache pages over swapping anonymous memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vm.min_free_kbytes&lt;/code&gt;: This parameter tells the kernel to keep a minimum amount of memory free as a buffer. When the amount of free memory drops below the this safety buffer, the kernel starts more aggressively reclaiming pages (swapping, and eventually handling OOM kills).
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Function:&lt;/strong&gt; It acts as a safety lever to ensure the kernel has enough memory for critical allocation requests that cannot be deferred.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Impact on swap&lt;/strong&gt;: Setting a higher &lt;code&gt;min_free_kbytes&lt;/code&gt; effectively raises the floor for for free memory, causing the kernel to initiate swap earlier under memory pressure.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vm.watermark_scale_factor&lt;/code&gt;: This setting controls the gap between different watermarks: &lt;code&gt;min&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt; and &lt;code&gt;high&lt;/code&gt;, which are calculated based on &lt;code&gt;min_free_kbytes&lt;/code&gt;.
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Watermarks explained&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;low&lt;/code&gt;: When free memory is below this mark, the &lt;code&gt;kswapd&lt;/code&gt; kernel process wakes up to reclaim pages in the background. This is when a swapping cycle begins.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;min&lt;/code&gt;: When free memory hits this minimum level, then aggressive page reclamation will block process allocation. Failing to reclaim pages will cause OOM kills.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;high&lt;/code&gt;: Memory reclamation stops once the free memory reaches this level.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Impact&lt;/strong&gt;: A higher &lt;code&gt;watermark_scale_factor&lt;/code&gt; careates a larger buffer between the &lt;code&gt;low&lt;/code&gt; and &lt;code&gt;min&lt;/code&gt; watermarks. This gives &lt;code&gt;kswapd&lt;/code&gt; more time to reclaim memory gradually before the system hits a critical state.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a typical server workload, you might have a long-running process with some memory that becomes &#39;cold&#39;. A higher &lt;code&gt;swappiness&lt;/code&gt; value can free up RAM by swapping out the cold memory, for other active processes that can benefit from keeping their file-cache.&lt;/p&gt;
&lt;p&gt;Tuning the &lt;code&gt;min_free_kbytes&lt;/code&gt; and &lt;code&gt;watermark_scale_factor&lt;/code&gt; parameters to move the swapping window early will give more room for &lt;code&gt;kswapd&lt;/code&gt; to offload memory to disk and prevent OOM kills during sudden memory spikes.&lt;/p&gt;
&lt;h2 id=&#34;swap-tests-and-results&#34;&gt;Swap tests and results&lt;/h2&gt;
&lt;p&gt;To understand the real-impact of these parameters, I designed a series of stress tests.&lt;/p&gt;
&lt;h3 id=&#34;test-setup&#34;&gt;Test setup&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Environment&lt;/strong&gt;: GKE on Google Cloud&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kubernetes version&lt;/strong&gt;: 1.33.2&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Node configuration&lt;/strong&gt;: &lt;code&gt;n2-standard-2&lt;/code&gt; (8GiB RAM, 50GB swap on a &lt;code&gt;pd-balanced&lt;/code&gt; disk, without encryption), Ubuntu 22.04&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Workload&lt;/strong&gt;: A custom Go application designed to allocate memory at a configurable rate, generate file-cache pressure, and simulate different memory access patterns (random vs sequential).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;: A sidecar container capturing system metrics every second.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Protection&lt;/strong&gt;: Critical system components (kubelet, container runtime, sshd) were prevented from swapping by setting &lt;code&gt;memory.swap.max=0&lt;/code&gt; in their respective cgroups.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;test-methodology&#34;&gt;Test methodology&lt;/h3&gt;
&lt;p&gt;I ran a stress-test pod on nodes with different swappiness settings (0, 60, and 90) and varied the &lt;code&gt;min_free_kbytes&lt;/code&gt; and &lt;code&gt;watermark_scale_factor&lt;/code&gt; parameters to observe the outcomes under heavy memory allocation and I/O pressure.&lt;/p&gt;
&lt;h4 id=&#34;visualizing-swap-in-action&#34;&gt;Visualizing swap in action&lt;/h4&gt;
&lt;p&gt;The graph below, from a 100MBps stress test, shows swap in action. As free memory (in the &amp;quot;Memory Usage&amp;quot; plot) decreases, swap usage (&lt;code&gt;Swap Used (GiB)&lt;/code&gt;) and swap-out activity (&lt;code&gt;Swap Out (MiB/s)&lt;/code&gt;) increase. Critically, as the system relies more on swap, the I/O activity and corresponding wait time (&lt;code&gt;IO Wait %&lt;/code&gt; in the &amp;quot;CPU Usage&amp;quot; plot) also rises, indicating CPU stress.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Graph showing CPU, Memory, Swap utilization and I/O activity on a Kubernetes node&#34; src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/swap_visualization.png&#34; title=&#34;swap visualization&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;findings&#34;&gt;Findings&lt;/h3&gt;
&lt;p&gt;My initial tests with default kernel parameters (&lt;code&gt;swappiness=60&lt;/code&gt;, &lt;code&gt;min_free_kbytes=68MB&lt;/code&gt;, &lt;code&gt;watermark_scale_factor=10&lt;/code&gt;) quickly led to OOM kills and even unexpected node restarts under high memory pressure. With selecting appropriate kernel parameters a good balance in node stability and performance can be achieved.&lt;/p&gt;
&lt;h4 id=&#34;the-impact-of-swappiness&#34;&gt;The impact of &lt;code&gt;swappiness&lt;/code&gt;&lt;/h4&gt;
&lt;p&gt;The swappiness parameter directly influences the kernel&#39;s choice between reclaiming anonymous memory (swapping) and dropping page cache. To observe this, I ran a test where one pod generated and held file-cache pressure, followed by a second pod allocating anonymous memory at 100MB/s, to observe the kernel preference on reclaim:&lt;/p&gt;
&lt;p&gt;My findings reveal a clear trade-off:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;swappiness=90&lt;/code&gt;: The kernel proactively swapped out the inactive anonymous memory to keep the file cache. This resulted in high and sustained swap usage and significant I/O activity (&amp;quot;Blocks Out&amp;quot;), which in turn caused spikes in I/O wait on the CPU.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;swappiness=0&lt;/code&gt;: The kernel favored dropping file-cache pages delaying swap consumption. However, it&#39;s critical to understand that this &lt;strong&gt;does not disable swapping&lt;/strong&gt;. When memory pressure was high, the kernel still swapped anonymous memory to disk.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The choice is workload-dependent. For workloads sensitive to I/O latency, a lower swappiness is preferable. For workloads that rely on a large and frequently accessed file cache, a higher swappiness may be beneficial, provided the underlying disk is fast enough to handle the load.&lt;/p&gt;
&lt;h4 id=&#34;tuning-watermarks-to-prevent-eviction-and-oom-kills&#34;&gt;Tuning watermarks to prevent eviction and OOM kills&lt;/h4&gt;
&lt;p&gt;The most critical challenge I encountered was the interaction between rapid memory allocation and Kubelet&#39;s eviction mechanism. When my test pod, which was deliberately configured to overcommit memory, allocated it at a high rate (e.g., 300-500 MBps), the system quickly ran out of free memory.&lt;/p&gt;
&lt;p&gt;With default watermarks, the buffer for reclamation was too small. Before &lt;code&gt;kswapd&lt;/code&gt; could free up enough memory by swapping, the node would hit a critical state, leading to two potential outcomes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Kubelet eviction&lt;/strong&gt; If kubelet&#39;s eviction manager detected &lt;code&gt;memory.available&lt;/code&gt; was below its threshold, it would evict the pod.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OOM killer&lt;/strong&gt; In some high-rate scenarios, the OOM Killer would activate before eviction could complete, sometimes killing higher priority pods that were not the source of the pressure.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To mitigate this I tuned the watermarks:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Increased &lt;code&gt;min_free_kbytes&lt;/code&gt; to 512MiB: This forces the kernel to start reclaiming memory much earlier, providing a larger safety buffer.&lt;/li&gt;
&lt;li&gt;Increased &lt;code&gt;watermark_scale_factor&lt;/code&gt; to 2000: This widened the gap between the &lt;code&gt;low&lt;/code&gt; and &lt;code&gt;high&lt;/code&gt; watermarks (from ≈337MB to ≈591MB in my test node&#39;s &lt;code&gt;/proc/zoneinfo&lt;/code&gt;), effectively increasing the swapping window.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This combination gave &lt;code&gt;kswapd&lt;/code&gt; a larger operational zone and more time to swap pages to disk during memory spikes, successfully preventing both premature evictions and OOM kills in my test runs.&lt;/p&gt;
&lt;p&gt;Table compares watermark levels from &lt;code&gt;/proc/zoneinfo&lt;/code&gt; (Non-NUMA node):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;min_free_kbytes=67584KiB&lt;/code&gt; and &lt;code&gt;watermark_scale_factor=10&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;min_free_kbytes=524288KiB&lt;/code&gt; and &lt;code&gt;watermark_scale_factor=2000&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Node 0, zone Normal &lt;br&gt;   pages free 583273 &lt;br&gt;   boost 0 &lt;br&gt;   min 10504 &lt;br&gt;   low 13130 &lt;br&gt;   high 15756 &lt;br&gt;   spanned 1310720 &lt;br&gt;   present 1310720 &lt;br&gt;   managed 1265603&lt;/td&gt;
&lt;td&gt;Node 0, zone Normal &lt;br&gt;   pages free 470539 &lt;br&gt;   min 82109 &lt;br&gt;   low 337017 &lt;br&gt;   high 591925&lt;br&gt;   spanned 1310720&lt;br&gt;   present 1310720 &lt;br&gt;   managed 1274542&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The graph below reveals that the kernel buffer size and scaling factor play a crucial role in determining how the system responds to memory load. With the right combination of these parameters, the system can effectively use swap space to avoid eviction and maintain stability.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;A side-by-side comparison of different min_free_kbytes settings, showing differences in Swap, Memory Usage and Eviction impact&#34; src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/memory-and-swap-growth.png&#34; title=&#34;Memory and Swap Utilization with min_free_kbytes&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;risks-and-recommendations&#34;&gt;Risks and recommendations&lt;/h3&gt;
&lt;p&gt;Enabling swap in Kubernetes is a powerful tool, but it comes with risks that must be managed through careful tuning.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Risk of performance degradation&lt;/strong&gt; Swapping is orders of magnitude slower than accessing RAM. If an application&#39;s active working set is swapped out, its performance will suffer dramatically due to high I/O wait times (thrashing). Swap could preferably be provisioned with a SSD backed storage to improve performance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Risk of masking memory leaks&lt;/strong&gt; Swap can hide memory leaks in applications, which might otherwise lead to a quick OOM kill. With swap, a leaky application might slowly degrade node performance over time, making the root cause harder to diagnose.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Risk of disabling evictions&lt;/strong&gt; Kubelet proactively monitors the node for memory-pressure and terminates pods to reclaim the resources. Improper tuning can lead to OOM kills before kubelet has a chance to evict pods gracefully. A properly configured &lt;code&gt;min_free_kbytes&lt;/code&gt; is essential to ensure kubelet&#39;s eviction mechanism remains effective.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;kubernetes-context&#34;&gt;Kubernetes context&lt;/h3&gt;
&lt;p&gt;Together, the kernel watermarks and kubelet eviction threshold create a series of memory pressure zones on a node. The eviction-threshold parameters need to be adjusted to configure Kubernetes managed evictions occur before the OOM kills.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Preferred thresholds for effective swap utilization&#34; src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/swap-thresholds.png&#34; title=&#34;Recommended Thresholds&#34;&gt;&lt;/p&gt;
&lt;p&gt;As the diagram shows, an ideal configuration will be to create a large enough &#39;swapping zone&#39; (between &lt;code&gt;high&lt;/code&gt; and &lt;code&gt;min&lt;/code&gt; watermarks) so that the kernel can handle memory pressure by swapping before available memory drops into the Eviction/Direct Reclaim zone.&lt;/p&gt;
&lt;h3 id=&#34;recommended-starting-point&#34;&gt;Recommended starting point&lt;/h3&gt;
&lt;p&gt;Based on these findings, I recommend the following as a starting point for Linux nodes with swap enabled. You should benchmark this with your own workloads.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;vm.swappiness=60&lt;/code&gt;: Linux default is a good starting point for general-purpose workloads. However, the ideal value is workload-dependent, and swap-sensitive applications may need more careful tuning.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vm.min_free_kbytes=500000&lt;/code&gt; (500MB): Set this to a reasonably high value (e.g., 2-3% of total node memory) to give the node a reasonable safety buffer.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vm.watermark_scale_factor=2000&lt;/code&gt;: Create a larger window for &lt;code&gt;kswapd&lt;/code&gt; to work with, preventing OOM kills during sudden memory allocation spikes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I encourage running benchmark tests with your own workloads in test-environments, when setting up swap for the first time in your Kubernetes cluster. Swap performance can be sensitive to different environment differences such as CPU load, disk type (SSD vs HDD) and I/O patterns.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.34: Service Account Token Integration for Image Pulls Graduates to Beta</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/15/kubernetes-v1-34-sa-tokens-image-pulls-beta/</link>
      <pubDate>Fri, 15 Aug 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/15/kubernetes-v1-34-sa-tokens-image-pulls-beta/</guid>
      <description>
        
        
        &lt;p&gt;The Kubernetes community continues to advance security best practices
by reducing reliance on long-lived credentials.
Following the successful &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/07/kubernetes-v1-33-wi-for-image-pulls/&#34;&gt;alpha release in Kubernetes v1.33&lt;/a&gt;,
&lt;em&gt;Service Account Token Integration for Kubelet Credential Providers&lt;/em&gt;
has now graduated to &lt;strong&gt;beta&lt;/strong&gt; in Kubernetes v1.34,
bringing us closer to eliminating long-lived image pull secrets from Kubernetes clusters.&lt;/p&gt;
&lt;p&gt;This enhancement allows credential providers
to use workload-specific service account tokens to obtain registry credentials,
providing a secure, ephemeral alternative to traditional image pull secrets.&lt;/p&gt;
&lt;h2 id=&#34;what-s-new-in-beta&#34;&gt;What&#39;s new in beta?&lt;/h2&gt;
&lt;p&gt;The beta graduation brings several important changes
that make the feature more robust and production-ready:&lt;/p&gt;
&lt;h3 id=&#34;required-cachetype-field&#34;&gt;Required &lt;code&gt;cacheType&lt;/code&gt; field&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Breaking change from alpha&lt;/strong&gt;: The &lt;code&gt;cacheType&lt;/code&gt; field is &lt;strong&gt;required&lt;/strong&gt;
in the credential provider configuration when using service account tokens.
This field is new in beta and must be specified to ensure proper caching behavior.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# CAUTION: this is not a complete configuration example, just a reference for the &amp;#39;tokenAttributes.cacheType&amp;#39; field.&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tokenAttributes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;serviceAccountTokenAudience&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;my-registry-audience&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cacheType&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;ServiceAccount&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Required field in beta&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requireServiceAccount&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Choose between two caching strategies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Token&lt;/code&gt;&lt;/strong&gt;: Cache credentials per service account token
(use when credential lifetime is tied to the token).
This is useful when the credential provider transforms the service account token into registry credentials
with the same lifetime as the token, or when registries support Kubernetes service account tokens directly.
Note: The kubelet cannot send service account tokens directly to registries;
credential provider plugins are needed to transform tokens into the username/password format expected by registries.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;ServiceAccount&lt;/code&gt;&lt;/strong&gt;: Cache credentials per service account identity
(use when credentials are valid for all pods using the same service account)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;isolated-image-pull-credentials&#34;&gt;Isolated image pull credentials&lt;/h3&gt;
&lt;p&gt;The beta release provides stronger security isolation for container images
when using service account tokens for image pulls.
It ensures that pods can only access images that were pulled using ServiceAccounts they&#39;re authorized to use.
This prevents unauthorized access to sensitive container images
and enables granular access control where different workloads can have different registry permissions
based on their ServiceAccount.&lt;/p&gt;
&lt;p&gt;When credential providers use service account tokens,
the system tracks ServiceAccount identity (namespace, name, and &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/overview/working-with-objects/names/#uids&#34;&gt;UID&lt;/a&gt;) for each pulled image.
When a pod attempts to use a cached image,
the system verifies that the pod&#39;s ServiceAccount matches exactly with the ServiceAccount
that was used to originally pull the image.&lt;/p&gt;
&lt;p&gt;Administrators can revoke access to previously pulled images
by deleting and recreating the ServiceAccount,
which changes the UID and invalidates cached image access.&lt;/p&gt;
&lt;p&gt;For more details about this capability,
see the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/containers/images/#ensureimagepullcredentialverification&#34;&gt;image pull credential verification&lt;/a&gt; documentation.&lt;/p&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;How it works&lt;/h2&gt;
&lt;h3 id=&#34;configuration&#34;&gt;Configuration&lt;/h3&gt;
&lt;p&gt;Credential providers opt into using ServiceAccount tokens
by configuring the &lt;code&gt;tokenAttributes&lt;/code&gt; field:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;#&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# CAUTION: this is an example configuration.&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;#          Do not use this for your own cluster!&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;#&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kubelet.config.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;CredentialProviderConfig&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;providers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;my-credential-provider&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchImages&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;*.myregistry.io/*&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;defaultCacheDuration&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;10m&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;credentialprovider.kubelet.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tokenAttributes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;serviceAccountTokenAudience&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;my-registry-audience&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cacheType&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;ServiceAccount&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# New in beta&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requireServiceAccount&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requiredServiceAccountAnnotationKeys&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;myregistry.io/identity-id&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;optionalServiceAccountAnnotationKeys&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;myregistry.io/optional-annotation&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;image-pull-flow&#34;&gt;Image pull flow&lt;/h3&gt;
&lt;p&gt;At a high level, &lt;code&gt;kubelet&lt;/code&gt; coordinates with your credential provider
and the container runtime as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When the image is not present locally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kubelet&lt;/code&gt; checks its credential cache using the configured &lt;code&gt;cacheType&lt;/code&gt;
(&lt;code&gt;Token&lt;/code&gt; or &lt;code&gt;ServiceAccount&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;If needed, &lt;code&gt;kubelet&lt;/code&gt; requests a ServiceAccount token for the pod&#39;s ServiceAccount
and passes it, plus any required annotations, to the credential provider&lt;/li&gt;
&lt;li&gt;The provider exchanges that token for registry credentials
and returns them to &lt;code&gt;kubelet&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubelet&lt;/code&gt; caches credentials per the &lt;code&gt;cacheType&lt;/code&gt; strategy
and pulls the image with those credentials&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubelet&lt;/code&gt; records the ServiceAccount coordinates (namespace, name, UID)
associated with the pulled image for later authorization checks&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the image is already present locally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kubelet&lt;/code&gt; verifies the pod&#39;s ServiceAccount coordinates
match the coordinates recorded for the cached image&lt;/li&gt;
&lt;li&gt;If they match exactly, the cached image can be used
without pulling from the registry&lt;/li&gt;
&lt;li&gt;If they differ, &lt;code&gt;kubelet&lt;/code&gt; performs a fresh pull
using credentials for the new ServiceAccount&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;With image pull credential verification enabled:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Authorization is enforced using the recorded ServiceAccount coordinates,
ensuring pods only use images pulled by a ServiceAccount
they are authorized to use&lt;/li&gt;
&lt;li&gt;Administrators can revoke access by deleting and recreating a ServiceAccount;
the UID changes and previously recorded authorization no longer matches&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;audience-restriction&#34;&gt;Audience restriction&lt;/h3&gt;
&lt;p&gt;The beta release builds on service account node audience restriction
(beta since v1.33) to ensure &lt;code&gt;kubelet&lt;/code&gt; can only request tokens for authorized audiences.
Administrators configure allowed audiences using RBAC to enable kubelet to request service account tokens for image pulls:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;#&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# CAUTION: this is an example configuration.&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;#          Do not use this for your own cluster!&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;#&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;rbac.authorization.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ClusterRole&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kubelet-credential-provider-audiences&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;verbs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;request-serviceaccounts-token-audience&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiGroups&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;my-registry-audience&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resourceNames&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;registry-access-sa&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;]  # Optional&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;specific SA&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;getting-started-with-beta&#34;&gt;Getting started with beta&lt;/h2&gt;
&lt;h3 id=&#34;prerequisites&#34;&gt;Prerequisites&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Kubernetes v1.34 or later&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature gate enabled&lt;/strong&gt;:
&lt;code&gt;KubeletServiceAccountTokenForCredentialProviders=true&lt;/code&gt; (beta, enabled by default)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Credential provider support&lt;/strong&gt;:
Update your credential provider to handle ServiceAccount tokens&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;migration-from-alpha&#34;&gt;Migration from alpha&lt;/h3&gt;
&lt;p&gt;If you&#39;re already using the alpha version,
the migration to beta requires minimal changes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Add &lt;code&gt;cacheType&lt;/code&gt; field&lt;/strong&gt;:
Update your credential provider configuration to include the required &lt;code&gt;cacheType&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Review caching strategy&lt;/strong&gt;:
Choose between &lt;code&gt;Token&lt;/code&gt; and &lt;code&gt;ServiceAccount&lt;/code&gt; cache types based on your provider&#39;s behavior&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Test audience restrictions&lt;/strong&gt;:
Ensure your RBAC configuration, or other cluster authorization rules, will properly restrict token audiences&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;example-setup&#34;&gt;Example setup&lt;/h3&gt;
&lt;p&gt;Here&#39;s a complete example
for setting up a credential provider with service account tokens
(this example assumes your cluster uses RBAC authorization):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;#&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# CAUTION: this is an example configuration.&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;#          Do not use this for your own cluster!&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;#&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Service Account with registry annotations&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ServiceAccount&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;registry-access-sa&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;default&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;myregistry.io/identity-id&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;user123&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# RBAC for audience restriction&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;rbac.authorization.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ClusterRole&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;registry-audience-access&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;verbs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;request-serviceaccounts-token-audience&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiGroups&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;my-registry-audience&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resourceNames&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;registry-access-sa&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;]  # Optional&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;specific ServiceAccount&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;rbac.authorization.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ClusterRoleBinding&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kubelet-registry-audience&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;roleRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiGroup&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;rbac.authorization.k8s.io&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ClusterRole&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;registry-audience-access&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;subjects&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Group&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;system:nodes&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiGroup&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;rbac.authorization.k8s.io&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Pod using the ServiceAccount&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;my-pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;serviceAccountName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;registry-access-sa&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;my-app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myregistry.example/my-app:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;what-s-next&#34;&gt;What&#39;s next?&lt;/h2&gt;
&lt;p&gt;For Kubernetes v1.35, we - Kubernetes SIG Auth - expect the feature to stay in beta,
and we will continue to solicit feedback.&lt;/p&gt;
&lt;p&gt;You can learn more about this feature
on the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/administer-cluster/kubelet-credential-provider/#service-account-token-for-image-pulls&#34;&gt;service account token for image pulls&lt;/a&gt;
page in the Kubernetes documentation.&lt;/p&gt;
&lt;p&gt;You can also follow along on the
&lt;a href=&#34;https://kep.k8s.io/4412&#34;&gt;KEP-4412&lt;/a&gt;
to track progress across the coming Kubernetes releases.&lt;/p&gt;
&lt;h2 id=&#34;call-to-action&#34;&gt;Call to action&lt;/h2&gt;
&lt;p&gt;In this blog post,
I have covered the beta graduation of ServiceAccount token integration
for Kubelet Credential Providers in Kubernetes v1.34.
I discussed the key improvements,
including the required &lt;code&gt;cacheType&lt;/code&gt; field
and enhanced integration with Ensure Secret Pull Images.&lt;/p&gt;
&lt;p&gt;We have been receiving positive feedback from the community during the alpha phase
and would love to hear more as we stabilize this feature for GA.
In particular, we would like feedback from credential provider implementors
as they integrate with the new beta API and caching mechanisms.
Please reach out to us on the &lt;a href=&#34;https://kubernetes.slack.com/archives/C04UMAUC4UA&#34;&gt;#sig-auth-authenticators-dev&lt;/a&gt; channel on Kubernetes Slack.&lt;/p&gt;
&lt;h2 id=&#34;how-to-get-involved&#34;&gt;How to get involved&lt;/h2&gt;
&lt;p&gt;If you are interested in getting involved in the development of this feature,
share feedback, or participate in any other ongoing SIG Auth projects,
please reach out on the &lt;a href=&#34;https://kubernetes.slack.com/archives/C0EN96KUY&#34;&gt;#sig-auth&lt;/a&gt; channel on Kubernetes Slack.&lt;/p&gt;
&lt;p&gt;You are also welcome to join the bi-weekly &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-auth/README.md#meetings&#34;&gt;SIG Auth meetings&lt;/a&gt;,
held every other Wednesday.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>PSI Metrics for Kubernetes Graduates to Beta</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/08/introducing-psi-metrics-beta/</link>
      <pubDate>Fri, 08 Aug 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/08/introducing-psi-metrics-beta/</guid>
      <description>
        
        
        &lt;p&gt;As Kubernetes clusters grow in size and complexity, understanding the health and performance of individual nodes becomes increasingly critical. We are excited to announce that as of Kubernetes v1.34, &lt;strong&gt;Pressure Stall Information (PSI) Metrics&lt;/strong&gt; has graduated to Beta.&lt;/p&gt;
&lt;h2 id=&#34;what-is-pressure-stall-information-psi&#34;&gt;What is Pressure Stall Information (PSI)?&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://docs.kernel.org/accounting/psi.html&#34;&gt;Pressure Stall Information (PSI)&lt;/a&gt; is a feature of the Linux kernel (version 4.20 and later)
that provides a canonical way to quantify pressure on infrastructure resources,
in terms of whether demand for a resource exceeds current supply.
It moves beyond simple resource utilization metrics and instead
measures the amount of time that tasks are stalled due to resource contention.
This is a powerful way to identify and diagnose resource bottlenecks that can impact application performance.&lt;/p&gt;
&lt;p&gt;PSI exposes metrics for CPU, memory, and I/O, categorized as either &lt;code&gt;some&lt;/code&gt; or &lt;code&gt;full&lt;/code&gt; pressure:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;some&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The percentage of time that &lt;strong&gt;at least one&lt;/strong&gt; task is stalled on a resource. This indicates some level of resource contention.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;full&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;The percentage of time that &lt;strong&gt;all&lt;/strong&gt; non-idle tasks are stalled on a resource simultaneously. This indicates a more severe resource bottleneck.&lt;/dd&gt;
&lt;/dl&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/images/psi-metrics-some-vs-full.svg&#34;
         alt=&#34;Diagram illustrating the difference between &amp;#39;some&amp;#39; and &amp;#39;full&amp;#39; PSI pressure.&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;PSI: &amp;#39;Some&amp;#39; vs. &amp;#39;Full&amp;#39; Pressure&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;These metrics are aggregated over 10-second, 1-minute, and 5-minute rolling windows, providing a comprehensive view of resource pressure over time.&lt;/p&gt;
&lt;h2 id=&#34;psi-metrics-in-kubernetes&#34;&gt;PSI metrics in Kubernetes&lt;/h2&gt;
&lt;p&gt;With the &lt;code&gt;KubeletPSI&lt;/code&gt; feature gate enabled, the kubelet can now collect PSI metrics from the Linux kernel and expose them through two channels: the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/instrumentation/node-metrics/#summary-api-source&#34;&gt;Summary API&lt;/a&gt; and the &lt;code&gt;/metrics/cadvisor&lt;/code&gt; Prometheus endpoint. This allows you to monitor and alert on resource pressure at the node, pod, and container level.&lt;/p&gt;
&lt;p&gt;The following new metrics are available in Prometheus exposition format via &lt;code&gt;/metrics/cadvisor&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;container_pressure_cpu_stalled_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_cpu_waiting_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_memory_stalled_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_memory_waiting_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_io_stalled_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_io_waiting_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These metrics, along with the data from the Summary API, provide a granular view of resource pressure, enabling you to pinpoint the source of performance issues and take corrective action. For example, you can use these metrics to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Identify memory leaks:&lt;/strong&gt; A steadily increasing &lt;code&gt;some&lt;/code&gt; pressure for memory can indicate a memory leak in an application.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimize resource requests and limits:&lt;/strong&gt; By understanding the resource pressure of your workloads, you can more accurately tune their resource requests and limits.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Autoscale workloads:&lt;/strong&gt; You can use PSI metrics to trigger autoscaling events, ensuring that your workloads have the resources they need to perform optimally.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-to-enable-psi-metrics&#34;&gt;How to enable PSI metrics&lt;/h2&gt;
&lt;p&gt;To enable PSI metrics in your Kubernetes cluster, you need to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Ensure your nodes are running a Linux kernel version 4.20 or later and are using cgroup v2.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enable the &lt;code&gt;KubeletPSI&lt;/code&gt; feature gate on the kubelet.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once enabled, you can start scraping the &lt;code&gt;/metrics/cadvisor&lt;/code&gt; endpoint with your Prometheus-compatible monitoring solution or query the Summary API to collect and visualize the new PSI metrics. Note that PSI is a Linux-kernel feature, so these metrics are not available on Windows nodes. Your cluster can contain a mix of Linux and Windows nodes, and on the Windows nodes the kubelet does not expose PSI metrics.&lt;/p&gt;
&lt;h2 id=&#34;what-s-next&#34;&gt;What&#39;s next?&lt;/h2&gt;
&lt;p&gt;We are excited to bring PSI metrics to the Kubernetes community and look forward to your feedback. As a beta feature, we are actively working on improving and extending this functionality towards a stable GA release. We encourage you to try it out and share your experiences with us.&lt;/p&gt;
&lt;p&gt;To learn more about PSI metrics, check out the official &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/instrumentation/understand-psi-metrics/&#34;&gt;Kubernetes documentation&lt;/a&gt;. You can also get involved in the conversation on the &lt;a href=&#34;https://kubernetes.slack.com/messages/sig-node&#34;&gt;#sig-node&lt;/a&gt; Slack channel.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Introducing Headlamp AI Assistant</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/07/introducing-headlamp-ai-assistant/</link>
      <pubDate>Thu, 07 Aug 2025 20:00:00 +0100</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/08/07/introducing-headlamp-ai-assistant/</guid>
      <description>
        
        
        &lt;p&gt;&lt;em&gt;This announcement originally &lt;a href=&#34;https://headlamp.dev/blog/2025/08/07/introducing-the-headlamp-ai-assistant&#34;&gt;appeared&lt;/a&gt; on the Headlamp blog.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;To simplify Kubernetes management and troubleshooting, we&#39;re thrilled to
introduce &lt;a href=&#34;https://github.com/headlamp-k8s/plugins/tree/main/ai-assistant#readme&#34;&gt;Headlamp AI Assistant&lt;/a&gt;: a powerful new plugin for Headlamp that helps
you understand and operate your Kubernetes clusters and applications with
greater clarity and ease.&lt;/p&gt;
&lt;p&gt;Whether you&#39;re a seasoned engineer or just getting started, the AI Assistant offers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fast time to value:&lt;/strong&gt; Ask questions like &lt;em&gt;&amp;quot;Is my application healthy?&amp;quot;&lt;/em&gt; or
&lt;em&gt;&amp;quot;How can I fix this?&amp;quot;&lt;/em&gt; without needing deep Kubernetes knowledge.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deep insights:&lt;/strong&gt; Start with high-level queries and dig deeper with prompts
like &lt;em&gt;&amp;quot;List all the problematic pods&amp;quot;&lt;/em&gt; or &lt;em&gt;&amp;quot;How can I fix this pod?&amp;quot;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Focused &amp;amp; relevant:&lt;/strong&gt; Ask questions in the context of what you&#39;re viewing
in the UI, such as &lt;em&gt;&amp;quot;What&#39;s wrong here?&amp;quot;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action-oriented:&lt;/strong&gt; Let the AI take action for you, like &lt;em&gt;&amp;quot;Restart that
deployment&amp;quot;&lt;/em&gt;, with your permission.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here is a demo of the AI Assistant in action as it helps troubleshoot an
application running with issues in a Kubernetes cluster:&lt;/p&gt;


    
    &lt;div class=&#34;youtube-quote-sm&#34;&gt;
      &lt;iframe allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34; allowfullscreen=&#34;allowfullscreen&#34; loading=&#34;eager&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; src=&#34;https://www.youtube.com/embed/GzXkUuCTcd4?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0&#34; title=&#34;Headlamp AI Assistant&#34;
      &gt;&lt;/iframe&gt;
    &lt;/div&gt;

&lt;h2 id=&#34;hopping-on-the-ai-train&#34;&gt;Hopping on the AI train&lt;/h2&gt;
&lt;p&gt;Large Language Models (LLMs) have transformed not just how we access data but
also how we interact with it. The rise of tools like ChatGPT opened a world of
possibilities, inspiring a wave of new applications. Asking questions or giving
commands in natural language is intuitive, especially for users who aren&#39;t deeply
technical. Now everyone can quickly ask how to do X or Y, without feeling awkward
or having to traverse pages and pages of documentation like before.&lt;/p&gt;
&lt;p&gt;Therefore, Headlamp AI Assistant brings a conversational UI to &lt;a href=&#34;https://headlamp.dev&#34;&gt;Headlamp&lt;/a&gt;,
powered by LLMs that Headlamp users can configure with their own API keys.
It is available as a Headlamp plugin, making it easy to integrate into your
existing setup. Users can enable it by installing the plugin and configuring
it with their own LLM API keys, giving them control over which model powers
the assistant. Once enabled, the assistant becomes part of the Headlamp UI,
ready to respond to contextual queries and perform actions directly from the
interface.&lt;/p&gt;
&lt;h2 id=&#34;context-is-everything&#34;&gt;Context is everything&lt;/h2&gt;
&lt;p&gt;As expected, the AI Assistant is focused on helping users with Kubernetes
concepts. Yet, while there is a lot of value in responding to Kubernetes
related questions from Headlamp&#39;s UI, we believe that the great benefit of such
an integration is when it can use the context of what the user is experiencing
in an application. So, the Headlamp AI Assistant knows what you&#39;re currently
viewing in Headlamp, and this makes the interaction feel more like working
with a human assistant.&lt;/p&gt;
&lt;p&gt;For example, if a pod is failing, users can simply ask &lt;em&gt;&amp;quot;What&#39;s wrong here?&amp;quot;&lt;/em&gt;
and the AI Assistant will respond with the root cause, like a missing
environment variable or a typo in the image name. Follow-up prompts like
&lt;em&gt;&amp;quot;How can I fix this?&amp;quot;&lt;/em&gt; allow the AI Assistant to suggest a fix, streamlining
what used to take multiple steps into a quick, conversational flow.&lt;/p&gt;
&lt;p&gt;Sharing the context from Headlamp is not a trivial task though, so it&#39;s
something we will keep working on perfecting.&lt;/p&gt;
&lt;h2 id=&#34;tools&#34;&gt;Tools&lt;/h2&gt;
&lt;p&gt;Context from the UI is helpful, but sometimes additional capabilities are
needed. If the user is viewing the pod list and wants to identify problematic
deployments, switching views should not be necessary. To address this, the AI
Assistant includes support for a Kubernetes tool. This allows asking questions
like &amp;quot;Get me all deployments with problems&amp;quot; prompting the assistant to fetch
and display relevant data from the current cluster. Likewise, if the user
requests an action like &amp;quot;Restart that deployment&amp;quot; after the AI points out what
deployment needs restarting, it can also do that. In case of &amp;quot;write&amp;quot;
operations, the AI Assistant does check with the user for permission to run them.&lt;/p&gt;
&lt;h2 id=&#34;ai-plugins&#34;&gt;AI Plugins&lt;/h2&gt;
&lt;p&gt;Although the initial version of the AI Assistant is already useful for
Kubernetes users, future iterations will expand its capabilities. Currently,
the assistant supports only the Kubernetes tool, but further integration with
Headlamp plugins is underway. Similarly, we could get richer insights for
GitOps via the Flux plugin, monitoring through Prometheus, package management
with Helm, and more.&lt;/p&gt;
&lt;p&gt;And of course, as the popularity of MCP grows, we are looking into how to
integrate it as well, for a more plug-and-play fashion.&lt;/p&gt;
&lt;h2 id=&#34;try-it-out&#34;&gt;Try it out!&lt;/h2&gt;
&lt;p&gt;We hope this first version of the AI Assistant helps users manage Kubernetes
clusters more effectively and assist newcomers in navigating the learning
curve. We invite you to try out this early version and give us your feedback.
The AI Assistant plugin can be installed from Headlamp&#39;s Plugin Catalog in the
desktop version, or by using the container image when deploying Headlamp.
Stay tuned for the future versions of the Headlamp AI Assistant!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.34 Sneak Peek</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/07/28/kubernetes-v1-34-sneak-peek/</link>
      <pubDate>Mon, 28 Jul 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/07/28/kubernetes-v1-34-sneak-peek/</guid>
      <description>
        
        
        &lt;p&gt;Kubernetes v1.34 is coming at the end of August 2025.
This release will not include any removal or deprecation, but it is packed with an impressive number of enhancements.
Here are some of the features we are most excited about in this cycle!&lt;/p&gt;
&lt;p&gt;Please note that this information reflects the current state of v1.34 development and may change before release.&lt;/p&gt;
&lt;h2 id=&#34;featured-enhancements-of-kubernetes-v1-34&#34;&gt;Featured enhancements of Kubernetes v1.34&lt;/h2&gt;
&lt;p&gt;The following list highlights some of the notable enhancements likely to be included in the v1.34 release,
but is not an exhaustive list of all planned changes.
This is not a commitment and the release content is subject to change.&lt;/p&gt;
&lt;h3 id=&#34;the-core-of-dra-targets-stable&#34;&gt;The core of DRA targets stable&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/dynamic-resource-allocation/&#34;&gt;Dynamic Resource Allocation&lt;/a&gt; (DRA) provides a flexible way to categorize,
request, and use devices like GPUs or custom hardware in your Kubernetes cluster.&lt;/p&gt;
&lt;p&gt;Since the v1.30 release, DRA has been based around claiming devices using &lt;em&gt;structured parameters&lt;/em&gt; that are opaque to the core of Kubernetes.
The relevant enhancement proposal, &lt;a href=&#34;https://kep.k8s.io/4381&#34;&gt;KEP-4381&lt;/a&gt;, took inspiration from dynamic provisioning for storage volumes.
DRA with structured parameters relies on a set of supporting API kinds: ResourceClaim, DeviceClass, ResourceClaimTemplate,
and ResourceSlice API types under &lt;code&gt;resource.k8s.io&lt;/code&gt;, while extending the &lt;code&gt;.spec&lt;/code&gt; for Pods with a new &lt;code&gt;resourceClaims&lt;/code&gt; field.
The core of DRA is targeting graduation to stable in Kubernetes v1.34.&lt;/p&gt;
&lt;p&gt;With DRA, device drivers and cluster admins define device classes that are available for use.
Workloads can claim devices from a device class within device requests.
Kubernetes allocates matching devices to specific claims and places the corresponding Pods on nodes that can access the allocated devices.
This framework provides flexible device filtering using CEL, centralized device categorization, and simplified Pod requests, among other benefits.&lt;/p&gt;
&lt;p&gt;Once this feature has graduated, the &lt;code&gt;resource.k8s.io/v1&lt;/code&gt; APIs will be available by default.&lt;/p&gt;
&lt;h3 id=&#34;serviceaccount-tokens-for-image-pull-authentication&#34;&gt;ServiceAccount tokens for image pull authentication&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/security/service-accounts/&#34;&gt;ServiceAccount&lt;/a&gt; token integration for &lt;code&gt;kubelet&lt;/code&gt; credential providers is likely to reach beta and be enabled by default in Kubernetes v1.34.
This allows the &lt;code&gt;kubelet&lt;/code&gt; to use these tokens when pulling container images from registries that require authentication.&lt;/p&gt;
&lt;p&gt;That support already exists as alpha, and is tracked as part of &lt;a href=&#34;https://kep.k8s.io/4412&#34;&gt;KEP-4412&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The existing alpha integration allows the &lt;code&gt;kubelet&lt;/code&gt; to use short-lived, automatically rotated ServiceAccount tokens (that follow OIDC-compliant semantics) to authenticate to a container image registry.
Each token is scoped to one associated Pod; the overall mechanism replaces the need for long-lived image pull Secrets.&lt;/p&gt;
&lt;p&gt;Adopting this new approach reduces security risks, supports workload-level identity, and helps cut operational overhead.
It brings image pull authentication closer to modern, identity-aware good practice.&lt;/p&gt;
&lt;h3 id=&#34;pod-replacement-policy-for-deployments&#34;&gt;Pod replacement policy for Deployments&lt;/h3&gt;
&lt;p&gt;After a change to a &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/controllers/deployment/&#34;&gt;Deployment&lt;/a&gt;, terminating pods may stay up for a considerable amount of time and may consume additional resources.
As part of &lt;a href=&#34;https://kep.k8s.io/3973&#34;&gt;KEP-3973&lt;/a&gt;, the &lt;code&gt;.spec.podReplacementPolicy&lt;/code&gt; field will be introduced (as alpha) for Deployments.&lt;/p&gt;
&lt;p&gt;If your cluster has the feature enabled, you&#39;ll be able to select one of two policies:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;TerminationStarted&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Creates new pods as soon as old ones start terminating, resulting in faster rollouts at the cost of potentially higher resource consumption.&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;TerminationComplete&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;Waits until old pods fully terminate before creating new ones, resulting in slower rollouts but ensuring controlled resource consumption.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;This feature makes Deployment behavior more predictable by letting you choose when new pods should be created during updates or scaling.
It&#39;s beneficial when working in clusters with tight resource constraints or with workloads with long termination periods.&lt;/p&gt;
&lt;p&gt;It&#39;s expected to be available as an alpha feature and can be enabled using the &lt;code&gt;DeploymentPodReplacementPolicy&lt;/code&gt; and &lt;code&gt;DeploymentReplicaSetTerminatingReplicas&lt;/code&gt; feature gates in the API server and kube-controller-manager.&lt;/p&gt;
&lt;h3 id=&#34;production-ready-tracing-for-kubelet-and-api-server&#34;&gt;Production-ready tracing for &lt;code&gt;kubelet&lt;/code&gt; and API Server&lt;/h3&gt;
&lt;p&gt;To address the longstanding challenge of debugging node-level issues by correlating disconnected logs,
&lt;a href=&#34;https://kep.k8s.io/2831&#34;&gt;KEP-2831&lt;/a&gt; provides deep, contextual insights into the &lt;code&gt;kubelet&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This feature instruments critical &lt;code&gt;kubelet&lt;/code&gt; operations, particularly its gRPC calls to the Container Runtime Interface (CRI), using the vendor-agnostic OpenTelemetry standard.
It allows operators to visualize the entire lifecycle of events (for example: a Pod startup) to pinpoint sources of latency and errors.
Its most powerful aspect is the propagation of trace context; the &lt;code&gt;kubelet&lt;/code&gt; passes a trace ID with its requests to the container runtime, enabling runtimes to link their own spans.&lt;/p&gt;
&lt;p&gt;This effort is complemented by a parallel enhancement, &lt;a href=&#34;https://kep.k8s.io/647&#34;&gt;KEP-647&lt;/a&gt;, which brings the same tracing capabilities to the Kubernetes API server.
Together, these enhancements provide a more unified, end-to-end view of events, simplifying the process of pinpointing latency and errors from the control plane down to the node.
These features have matured through the official Kubernetes release process.
&lt;a href=&#34;https://kep.k8s.io/2831&#34;&gt;KEP-2831&lt;/a&gt; was introduced as an alpha feature in v1.25, while &lt;a href=&#34;https://kep.k8s.io/647&#34;&gt;KEP-647&lt;/a&gt; debuted as alpha in v1.22.
Both enhancements were promoted to beta together in the v1.27 release.
Looking forward, Kubelet Tracing (&lt;a href=&#34;https://kep.k8s.io/2831&#34;&gt;KEP-2831&lt;/a&gt;) and API Server Tracing (&lt;a href=&#34;https://kep.k8s.io/647&#34;&gt;KEP-647&lt;/a&gt;) are now targeting graduation to stable in the upcoming v1.34 release.&lt;/p&gt;
&lt;h3 id=&#34;prefersamezone-and-prefersamenode-traffic-distribution-for-services&#34;&gt;&lt;code&gt;PreferSameZone&lt;/code&gt; and &lt;code&gt;PreferSameNode&lt;/code&gt; traffic distribution for Services&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;spec.trafficDistribution&lt;/code&gt; field within a Kubernetes &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/services-networking/service/&#34;&gt;Service&lt;/a&gt; allows users to express preferences for how traffic should be routed to Service endpoints.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/3015&#34;&gt;KEP-3015&lt;/a&gt; deprecates &lt;code&gt;PreferClose&lt;/code&gt; and introduces two additional values: &lt;code&gt;PreferSameZone&lt;/code&gt; and &lt;code&gt;PreferSameNode&lt;/code&gt;.
&lt;code&gt;PreferSameZone&lt;/code&gt; is equivalent to the current &lt;code&gt;PreferClose&lt;/code&gt;.
&lt;code&gt;PreferSameNode&lt;/code&gt; prioritizes sending traffic to endpoints on the same node as the client.&lt;/p&gt;
&lt;p&gt;This feature was introduced in v1.33 behind the &lt;code&gt;PreferSameTrafficDistribution&lt;/code&gt; feature gate.
It is targeting graduation to beta in v1.34 with its feature gate enabled by default.&lt;/p&gt;
&lt;h3 id=&#34;support-for-kyaml-a-kubernetes-dialect-of-yaml&#34;&gt;Support for KYAML: a Kubernetes dialect of YAML&lt;/h3&gt;
&lt;p&gt;KYAML aims to be a safer and less ambiguous YAML subset, and was designed specifically
for Kubernetes. Whatever version of Kubernetes you use, you&#39;ll be able use KYAML for writing manifests
and/or Helm charts.
You can write KYAML and pass it as an input to &lt;strong&gt;any&lt;/strong&gt; version of &lt;code&gt;kubectl&lt;/code&gt;,
because all KYAML files are also valid as YAML.
With kubectl v1.34, we expect you&#39;ll also be able to request KYAML output from &lt;code&gt;kubectl&lt;/code&gt; (as in &lt;code&gt;kubectl get -o kyaml …&lt;/code&gt;).
If you prefer, you can still request the output in JSON or YAML format.&lt;/p&gt;
&lt;p&gt;KYAML addresses specific challenges with both YAML and JSON.
YAML&#39;s significant whitespace requires careful attention to indentation and nesting,
while its optional string-quoting can lead to unexpected type coercion (for example: &lt;a href=&#34;https://hitchdev.com/strictyaml/why/implicit-typing-removed/&#34;&gt;&amp;quot;The Norway Bug&amp;quot;&lt;/a&gt;).
Meanwhile, JSON lacks comment support and has strict requirements for trailing commas and quoted keys.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/5295&#34;&gt;KEP-5295&lt;/a&gt; introduces KYAML, which tries to address the most significant problems by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Always double-quoting value strings&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Leaving keys unquoted unless they are potentially ambiguous&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Always using &lt;code&gt;{}&lt;/code&gt; for mappings (associative arrays)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Always using &lt;code&gt;[]&lt;/code&gt; for lists&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This might sound a lot like JSON, because it is! But unlike JSON, KYAML supports comments, allows trailing commas, and doesn&#39;t require quoted keys.&lt;/p&gt;
&lt;p&gt;We&#39;re hoping to see KYAML introduced as a new output format for &lt;code&gt;kubectl&lt;/code&gt; v1.34.
As with all these features, none of these changes are 100% confirmed; watch this space!&lt;/p&gt;
&lt;p&gt;As a format, KYAML is and will remain a &lt;strong&gt;strict subset of YAML&lt;/strong&gt;, ensuring that any compliant YAML parser can parse KYAML documents.
Kubernetes does not require you to provide input specifically formatted as KYAML, and we have no plans to change that.&lt;/p&gt;
&lt;h3 id=&#34;fine-grained-autoscaling-control-with-hpa-configurable-tolerance&#34;&gt;Fine-grained autoscaling control with HPA configurable tolerance&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/4951&#34;&gt;KEP-4951&lt;/a&gt; introduces a new feature that allows users to configure autoscaling tolerance on a per-HPA basis,
overriding the default cluster-wide 10% tolerance setting that often proves too coarse-grained for diverse workloads.
The enhancement adds an optional &lt;code&gt;tolerance&lt;/code&gt; field to the HPA&#39;s &lt;code&gt;spec.behavior.scaleUp&lt;/code&gt; and &lt;code&gt;spec.behavior.scaleDown&lt;/code&gt; sections,
enabling different tolerance values for scale-up and scale-down operations,
which is particularly valuable since scale-up responsiveness is typically more critical than scale-down speed for handling traffic surges.&lt;/p&gt;
&lt;p&gt;Released as alpha in Kubernetes v1.33 behind the &lt;code&gt;HPAConfigurableTolerance&lt;/code&gt; feature gate, this feature is expected to graduate to beta in v1.34.
This improvement helps to address scaling challenges with large deployments, where for scaling in,
a 10% tolerance might mean leaving hundreds of unnecessary Pods running.
Using the new, more flexible approach would enable workload-specific optimization for both
responsive and conservative scaling behaviors.&lt;/p&gt;
&lt;h2 id=&#34;want-to-know-more&#34;&gt;Want to know more?&lt;/h2&gt;
&lt;p&gt;New features and deprecations are also announced in the Kubernetes release notes.
We will formally announce what&#39;s new in &lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.34.md&#34;&gt;Kubernetes v1.34&lt;/a&gt; as part of the CHANGELOG for that release.&lt;/p&gt;
&lt;p&gt;The Kubernetes v1.34 release is planned for &lt;strong&gt;Wednesday 27th August 2025&lt;/strong&gt;. Stay tuned for updates!&lt;/p&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;Get involved&lt;/h2&gt;
&lt;p&gt;The simplest way to get involved with Kubernetes is to join one of the many &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-list.md&#34;&gt;Special Interest Groups&lt;/a&gt; (SIGs) that align with your interests.
Have something you&#39;d like to broadcast to the Kubernetes community? Share your voice at our weekly &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/communication&#34;&gt;community meeting&lt;/a&gt;, and through the channels below.
Thank you for your continued feedback and support.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Follow us on Bluesky &lt;a href=&#34;https://bsky.app/profile/kubernetes.io&#34;&gt;@kubernetes.io&lt;/a&gt; for the latest updates&lt;/li&gt;
&lt;li&gt;Join the community discussion on &lt;a href=&#34;https://discuss.kubernetes.io/&#34;&gt;Discuss&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Join the community on &lt;a href=&#34;http://slack.k8s.io/&#34;&gt;Slack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post questions (or answer questions) on &lt;a href=&#34;https://serverfault.com/questions/tagged/kubernetes&#34;&gt;Server Fault&lt;/a&gt; or &lt;a href=&#34;http://stackoverflow.com/questions/tagged/kubernetes&#34;&gt;Stack Overflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Share your Kubernetes &lt;a href=&#34;https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform&#34;&gt;story&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Read more about what&#39;s happening with Kubernetes on the &lt;a href=&#34;https://kubernetes.io/blog/&#34;&gt;blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Learn more about the &lt;a href=&#34;https://github.com/kubernetes/sig-release/tree/master/release-team&#34;&gt;Kubernetes Release Team&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Post-Quantum Cryptography in Kubernetes</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/07/18/pqc-in-k8s/</link>
      <pubDate>Fri, 18 Jul 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/07/18/pqc-in-k8s/</guid>
      <description>
        
        
        &lt;p&gt;The world of cryptography is on the cusp of a major shift with the advent of
quantum computing. While powerful quantum computers are still largely
theoretical for many applications, their potential to break current
cryptographic standards is a serious concern, especially for long-lived
systems. This is where &lt;em&gt;Post-Quantum Cryptography&lt;/em&gt; (PQC) comes in. In this
article, I&#39;ll dive into what PQC means for TLS and, more specifically, for the
Kubernetes ecosystem. I&#39;ll explain what the (suprising) state of PQC in
Kubernetes is and what the implications are for current and future clusters.&lt;/p&gt;
&lt;h2 id=&#34;what-is-post-quantum-cryptography&#34;&gt;What is Post-Quantum Cryptography&lt;/h2&gt;
&lt;p&gt;Post-Quantum Cryptography refers to cryptographic algorithms that are thought to
be secure against attacks by both classical and quantum computers. The primary
concern is that quantum computers, using algorithms like &lt;a href=&#34;https://en.wikipedia.org/wiki/Shor%27s_algorithm&#34;&gt;Shor&#39;s Algorithm&lt;/a&gt;,
could efficiently break widely used public-key cryptosystems such as RSA and
Elliptic Curve Cryptography (ECC), which underpin much of today&#39;s secure
communication, including TLS. The industry is actively working on standardizing
and adopting PQC algorithms. One of the first to be standardized by &lt;a href=&#34;https://www.nist.gov/&#34;&gt;NIST&lt;/a&gt; is
the Module-Lattice Key Encapsulation Mechanism (&lt;code&gt;ML-KEM&lt;/code&gt;), formerly known as
Kyber, and now standardized as &lt;a href=&#34;https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.203.pdf&#34;&gt;FIPS-203&lt;/a&gt; (PDF download).&lt;/p&gt;
&lt;p&gt;It is difficult to predict when quantum computers will be able to break
classical algorithms. However, it is clear that we need to start migrating to
PQC algorithms now, as the next section shows. To get a feeling for the
predicted timeline we can look at a &lt;a href=&#34;https://nvlpubs.nist.gov/nistpubs/ir/2024/NIST.IR.8547.ipd.pdf&#34;&gt;NIST report&lt;/a&gt; covering the transition to
post-quantum cryptography standards. It declares that system with classical
crypto should be deprecated after 2030 and disallowed after 2035.&lt;/p&gt;
&lt;h2 id=&#34;timelines&#34;&gt;Key exchange vs. digital signatures: different needs, different timelines&lt;/h2&gt;
&lt;p&gt;In TLS, there are two main cryptographic operations we need to secure:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Exchange&lt;/strong&gt;: This is how the client and server agree on a shared secret to
encrypt their communication. If an attacker records encrypted traffic today,
they could decrypt it in the future, if they gain access to a quantum computer
capable of breaking the key exchange. This makes migrating KEMs to PQC an
immediate priority.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Digital Signatures&lt;/strong&gt;: These are primarily used to authenticate the server (and
sometimes the client) via certificates. The authenticity of a server is
verified at the time of connection. While important, the risk of an attack
today is much lower, because the decision of trusting a server cannot be abused
after the fact. Additionally, current PQC signature schemes often come with
significant computational overhead and larger key/signature sizes compared to
their classical counterparts.&lt;/p&gt;
&lt;p&gt;Another significant hurdle in the migration to PQ certificates is the upgrade
of root certificates. These certificates have long validity periods and are
installed in many devices and operating systems as trust anchors.&lt;/p&gt;
&lt;p&gt;Given these differences, the focus for immediate PQC adoption in TLS has been
on hybrid key exchange mechanisms. These combine a classical algorithm (such as
Elliptic Curve Diffie-Hellman Ephemeral (ECDHE)) with a PQC algorithm (such as
&lt;code&gt;ML-KEM&lt;/code&gt;). The resulting shared secret is secure as long as at least one of the
component algorithms remains unbroken. The &lt;code&gt;X25519MLKEM768&lt;/code&gt; hybrid scheme is the
most widely supported one.&lt;/p&gt;
&lt;h2 id=&#34;state-of-kems&#34;&gt;State of PQC key exchange mechanisms (KEMs) today&lt;/h2&gt;
&lt;p&gt;Support for PQC KEMs is rapidly improving across the ecosystem.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Go&lt;/strong&gt;: The Go standard library&#39;s &lt;code&gt;crypto/tls&lt;/code&gt; package introduced support for
&lt;code&gt;X25519MLKEM768&lt;/code&gt; in version 1.24 (released February 2025). Crucially, it&#39;s
enabled by default when there is no explicit configuration, i.e.,
&lt;code&gt;Config.CurvePreferences&lt;/code&gt; is &lt;code&gt;nil&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Browsers &amp;amp; OpenSSL&lt;/strong&gt;: Major browsers like Chrome (version 131, November 2024)
and Firefox (version 135, February 2025), as well as OpenSSL (version 3.5.0,
April 2025), have also added support for the &lt;code&gt;ML-KEM&lt;/code&gt; based hybrid scheme.&lt;/p&gt;
&lt;p&gt;Apple is also &lt;a href=&#34;https://support.apple.com/en-lb/122756&#34;&gt;rolling out support&lt;/a&gt; for &lt;code&gt;X25519MLKEM768&lt;/code&gt; in version
26 of their operating systems. Given the proliferation of Apple devices, this
will have a significant impact on the global PQC adoption.&lt;/p&gt;
&lt;p&gt;For a more detailed overview of the state of PQC in the wider industry,
see &lt;a href=&#34;https://blog.cloudflare.com/pq-2024/&#34;&gt;this blog post by Cloudflare&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;post-quantum-kems-in-kubernetes-an-unexpected-arrival&#34;&gt;Post-quantum KEMs in Kubernetes: an unexpected arrival&lt;/h2&gt;
&lt;p&gt;So, what does this mean for Kubernetes? Kubernetes components, including the
API server and kubelet, are built with Go.&lt;/p&gt;
&lt;p&gt;As of Kubernetes v1.33, released in April 2025, the project uses Go 1.24. A
quick check of the Kubernetes codebase reveals that &lt;code&gt;Config.CurvePreferences&lt;/code&gt;
is not explicitly set. This leads to a fascinating conclusion: Kubernetes
v1.33, by virtue of using Go 1.24, supports hybrid post-quantum
&lt;code&gt;X25519MLKEM768&lt;/code&gt; for TLS connections by default!&lt;/p&gt;
&lt;p&gt;You can test this yourself. If you set up a Minikube cluster running Kubernetes
v1.33.0, you can connect to the API server using a recent OpenSSL client:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;$&lt;/span&gt; minikube start --kubernetes-version&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;v1.33.0
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;$&lt;/span&gt; kubectl cluster-info
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;Kubernetes control plane is running at https://127.0.0.1:&amp;lt;PORT&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;$&lt;/span&gt; kubectl config view --minify --raw -o &lt;span style=&#34;color:#b8860b&#34;&gt;jsonpath&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;{&lt;/span&gt;.clusters&lt;span style=&#34;color:#666&#34;&gt;[&lt;/span&gt;0&lt;span style=&#34;color:#666&#34;&gt;]&lt;/span&gt;.cluster.certificate-authority-data&lt;span style=&#34;color:#666&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\&amp;#39;&lt;/span&gt; | base64 -d &amp;gt; ca.crt
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;$&lt;/span&gt; openssl version
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;OpenSSL 3.5.0 8 Apr 2025 (Library: OpenSSL 3.5.0 8 Apr 2025)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;$&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;echo&lt;/span&gt; -n &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Q&amp;#34;&lt;/span&gt; | openssl s_client -connect 127.0.0.1:&amp;lt;PORT&amp;gt; -CAfile ca.crt
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;[...]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;Negotiated TLS1.3 group: X25519MLKEM768
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;[...]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;DONE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Lo and behold, the negotiated group is &lt;code&gt;X25519MLKEM768&lt;/code&gt;! This is a significant
step towards making Kubernetes quantum-safe, seemingly without a major
announcement or dedicated KEP (Kubernetes Enhancement Proposal).&lt;/p&gt;
&lt;h2 id=&#34;the-go-version-mismatch-pitfall&#34;&gt;The Go version mismatch pitfall&lt;/h2&gt;
&lt;p&gt;An interesting wrinkle emerged with Go versions 1.23 and 1.24. Go 1.23
included experimental support for a draft version of &lt;code&gt;ML-KEM&lt;/code&gt;, identified as
&lt;code&gt;X25519Kyber768Draft00&lt;/code&gt;. This was also enabled by default if
&lt;code&gt;Config.CurvePreferences&lt;/code&gt; was &lt;code&gt;nil&lt;/code&gt;. Kubernetes v1.32 used Go 1.23. However,
Go 1.24 removed the draft support and replaced it with the standardized version
&lt;code&gt;X25519MLKEM768&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;What happens if a client and server are using mismatched Go versions (one on
1.23, the other on 1.24)? They won&#39;t have a common PQC KEM to negotiate, and
the handshake will fall back to classical ECC curves (e.g., &lt;code&gt;X25519&lt;/code&gt;). How
could this happen in practice?&lt;/p&gt;
&lt;p&gt;Consider a scenario:&lt;/p&gt;
&lt;p&gt;A Kubernetes cluster is running v1.32 (using Go 1.23 and thus
&lt;code&gt;X25519Kyber768Draft00&lt;/code&gt;). A developer upgrades their &lt;code&gt;kubectl&lt;/code&gt; to v1.33,
compiled with Go 1.24, only supporting &lt;code&gt;X25519MLKEM768&lt;/code&gt;. Now, when &lt;code&gt;kubectl&lt;/code&gt;
communicates with the v1.32 API server, they no longer share a common PQC
algorithm. The connection will downgrade to classical cryptography, silently
losing the PQC protection that has been in place. This highlights the
importance of understanding the implications of Go version upgrades, and the
details of the TLS stack.&lt;/p&gt;
&lt;h2 id=&#34;limitation-packet-size&#34;&gt;Limitations: packet size&lt;/h2&gt;
&lt;p&gt;One practical consideration with &lt;code&gt;ML-KEM&lt;/code&gt; is the size of its public keys
with encoded key sizes of around 1.2 kilobytes for &lt;code&gt;ML-KEM-768&lt;/code&gt;.
This can cause the initial TLS &lt;code&gt;ClientHello&lt;/code&gt; message not to fit inside
a single TCP/IP packet, given the typical networking constraints
(most commonly, the standard Ethernet frame size limit of 1500
bytes). Some TLS libraries or network appliances might not handle this
gracefully, assuming the Client Hello always fits in one packet. This issue
has been observed in some Kubernetes-related projects and networking
components, potentially leading to connection failures when PQC KEMs are used.
More details can be found at &lt;a href=&#34;https://tldr.fail/&#34;&gt;tldr.fail&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;state-of-post-quantum-signatures&#34;&gt;State of Post-Quantum Signatures&lt;/h2&gt;
&lt;p&gt;While KEMs are seeing broader adoption, PQC digital signatures are further
behind in terms of widespread integration into standard toolchains. NIST has
published standards for PQC signatures, such as &lt;code&gt;ML-DSA&lt;/code&gt; (&lt;code&gt;FIPS-204&lt;/code&gt;) and
&lt;code&gt;SLH-DSA&lt;/code&gt; (&lt;code&gt;FIPS-205&lt;/code&gt;). However, implementing these in a way that&#39;s broadly
usable (e.g., for PQC Certificate Authorities) &lt;a href=&#34;https://blog.cloudflare.com/another-look-at-pq-signatures/#the-algorithms&#34;&gt;presents challenges&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Larger Keys and Signatures&lt;/strong&gt;: PQC signature schemes often have significantly
larger public keys and signature sizes compared to classical algorithms like
Ed25519 or RSA. For instance, Dilithium2 keys can be 30 times larger than
Ed25519 keys, and certificates can be 12 times larger.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt;: Signing and verification operations &lt;a href=&#34;https://pqshield.github.io/nist-sigs-zoo/&#34;&gt;can be substantially slower&lt;/a&gt;.
While some algorithms are on par with classical algorithms, others may have a
much higher overhead, sometimes on the order of 10x to 1000x worse performance.
To improve this situation, NIST is running a
&lt;a href=&#34;https://csrc.nist.gov/news/2024/pqc-digital-signature-second-round-announcement&#34;&gt;second round of standardization&lt;/a&gt; for PQC signatures.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Toolchain Support&lt;/strong&gt;: Mainstream TLS libraries and CA software do not yet have
mature, built-in support for these new signature algorithms. The Go team, for
example, has indicated that &lt;code&gt;ML-DSA&lt;/code&gt; support is a high priority, but the
soonest it might appear in the standard library is Go 1.26 &lt;a href=&#34;https://github.com/golang/go/issues/64537#issuecomment-2877714729&#34;&gt;(as of May 2025)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/cloudflare/circl&#34;&gt;Cloudflare&#39;s CIRCL&lt;/a&gt; (Cloudflare Interoperable Reusable Cryptographic Library)
library implements some PQC signature schemes like variants of Dilithium, and
they maintain a &lt;a href=&#34;https://github.com/cloudflare/go&#34;&gt;fork of Go (cfgo)&lt;/a&gt; that integrates CIRCL. Using &lt;code&gt;cfgo&lt;/code&gt;, it&#39;s
possible to experiment with generating certificates signed with PQC algorithms
like Ed25519-Dilithium2. However, this requires using a custom Go toolchain and
is not yet part of the mainstream Kubernetes or Go distributions.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The journey to a post-quantum secure Kubernetes is underway, and perhaps
further along than many realize, thanks to the proactive adoption of &lt;code&gt;ML-KEM&lt;/code&gt;
in Go. With Kubernetes v1.33, users are already benefiting from hybrid post-quantum key
exchange in many TLS connections by default.&lt;/p&gt;
&lt;p&gt;However, awareness of potential pitfalls, such as Go version mismatches leading
to downgrades and issues with Client Hello packet sizes, is crucial. While PQC
for KEMs is becoming a reality, PQC for digital signatures and certificate
hierarchies is still in earlier stages of development and adoption for
mainstream use. As Kubernetes maintainers and contributors, staying informed
about these developments will be key to ensuring the long-term security of the
platform.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Navigating Failures in Pods With Devices</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/07/03/navigating-failures-in-pods-with-devices/</link>
      <pubDate>Thu, 03 Jul 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/07/03/navigating-failures-in-pods-with-devices/</guid>
      <description>
        
        
        &lt;p&gt;Kubernetes is the de facto standard for container orchestration, but when it
comes to handling specialized hardware like GPUs and other accelerators, things
get a bit complicated. This blog post dives into the challenges of managing
failure modes when operating pods with devices in Kubernetes, based on insights
from &lt;a href=&#34;https://sched.co/1i7pT&#34;&gt;Sergey Kanzhelev and Mrunal Patel&#39;s talk at KubeCon NA
2024&lt;/a&gt;. You can follow the links to
&lt;a href=&#34;https://static.sched.com/hosted_files/kccncna2024/b9/KubeCon%20NA%202024_%20Navigating%20Failures%20in%20Pods%20With%20Devices_%20Challenges%20and%20Solutions.pptx.pdf?_gl=1*191m4j5*_gcl_au*MTU1MDM0MTM1My4xNzMwOTE4ODY5LjIxNDI4Nzk1NDIuMTczMTY0ODgyMC4xNzMxNjQ4ODIy*FPAU*MTU1MDM0MTM1My4xNzMwOTE4ODY5&#34;&gt;slides&lt;/a&gt;
and
&lt;a href=&#34;https://www.youtube.com/watch?v=-YCnOYTtVO8&amp;list=PLj6h78yzYM2Pw4mRw4S-1p_xLARMqPkA7&amp;index=150&#34;&gt;recording&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;the-ai-ml-boom-and-its-impact-on-kubernetes&#34;&gt;The AI/ML boom and its impact on Kubernetes&lt;/h2&gt;
&lt;p&gt;The rise of AI/ML workloads has brought new challenges to Kubernetes. These
workloads often rely heavily on specialized hardware, and any device failure can
significantly impact performance and lead to frustrating interruptions. As
highlighted in the 2024 &lt;a href=&#34;https://ai.meta.com/research/publications/the-llama-3-herd-of-models/&#34;&gt;Llama
paper&lt;/a&gt;,
hardware issues, particularly GPU failures, are a major cause of disruption in
AI/ML training. You can also learn how much effort NVIDIA spends on handling
devices failures and maintenance in the KubeCon talk by &lt;a href=&#34;https://kccncna2024.sched.com/event/1i7kJ/all-your-gpus-are-belong-to-us-an-inside-look-at-nvidias-self-healing-geforce-now-infrastructure-ryan-hallisey-piotr-prokop-pl-nvidia&#34;&gt;Ryan Hallisey and Piotr
Prokop All-Your-GPUs-Are-Belong-to-Us: An Inside Look at NVIDIA&#39;s Self-Healing
GeForce NOW
Infrastructure&lt;/a&gt;
(&lt;a href=&#34;https://www.youtube.com/watch?v=iLnHtKwmu2I&#34;&gt;recording&lt;/a&gt;) as they see 19
remediation requests per 1000 nodes a day!
We also see data centers offering spot consumption models and overcommit on
power, making device failures commonplace and a part of the business model.&lt;/p&gt;
&lt;p&gt;However, Kubernetes’s view on resources is still very static. The resource is
either there or not. And if it is there, the assumption is that it will stay
there fully functional - Kubernetes lacks good support for handling full or partial
hardware failures. These long-existing assumptions combined with the overall complexity of a setup lead
to a variety of failure modes, which we discuss here.&lt;/p&gt;
&lt;h3 id=&#34;understanding-ai-ml-workloads&#34;&gt;Understanding AI/ML workloads&lt;/h3&gt;
&lt;p&gt;Generally, all AI/ML workloads require specialized hardware, have challenging
scheduling requirements, and are expensive when idle. AI/ML workloads typically
fall into two categories - training and inference. Here is an oversimplified
view of those categories’ characteristics, which are different from traditional workloads
like web services:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Training&lt;/dt&gt;
&lt;dd&gt;These workloads are resource-intensive, often consuming entire
machines and running as gangs of pods. Training jobs are usually &amp;quot;run to
completion&amp;quot; - but that could be days, weeks or even months. Any failure in a
single pod can necessitate restarting the entire step across all the pods.&lt;/dd&gt;
&lt;dt&gt;Inference&lt;/dt&gt;
&lt;dd&gt;These workloads are usually long-running or run indefinitely,
and can be small enough to consume a subset of a Node’s devices or large enough to span
multiple nodes. They often require downloading huge files with the model
weights.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;These workload types specifically break many past assumptions:&lt;/p&gt;


 





&lt;table&gt;&lt;caption style=&#34;display: none;&#34;&gt;Workload assumptions before and now&lt;/caption&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;Before&lt;/th&gt;
&lt;th style=&#34;text-align:left&#34;&gt;Now&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Can get a better CPU and the app will work faster.&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Require a &lt;strong&gt;specific&lt;/strong&gt; device (or &lt;strong&gt;class of devices&lt;/strong&gt;) to run.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;When something doesn’t work, just recreate it.&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Allocation or reallocation is expensive.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Any node will work. No need to coordinate between Pods.&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Scheduled in a special way - devices often connected in a cross-node topology.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Each Pod can be plug-and-play replaced if failed.&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Pods are a part of a larger task. Lifecycle of an entire task depends on each Pod.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Container images are slim and easily available.&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Container images may be so big that they require special handling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Long initialization can be offset by slow rollout.&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Initialization may be long and should be optimized, sometimes across many Pods together.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Compute nodes are commoditized and relatively inexpensive, so some idle time is acceptable.&lt;/td&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Nodes with specialized hardware can be an order of magnitude more expensive than those without, so idle time is very wasteful.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The existing failure model was relying on old assumptions. It may still work for
the new workload types, but it has limited knowledge about devices and is very
expensive for them. In some cases, even prohibitively expensive. You will see
more examples later in this article.&lt;/p&gt;
&lt;h3 id=&#34;why-kubernetes-still-reigns-supreme&#34;&gt;Why Kubernetes still reigns supreme&lt;/h3&gt;
&lt;p&gt;This article is not going deeper into the question: why not start fresh for&lt;br&gt;
AI/ML workloads since they are so different from the traditional Kubernetes
workloads. Despite many challenges, Kubernetes remains the platform of choice
for AI/ML workloads. Its maturity, security, and rich ecosystem of tools make it
a compelling option. While alternatives exist, they often lack the years of
development and refinement that Kubernetes offers. And the Kubernetes developers
are actively addressing the gaps identified in this article and beyond.&lt;/p&gt;
&lt;h2 id=&#34;the-current-state-of-device-failure-handling&#34;&gt;The current state of device failure handling&lt;/h2&gt;
&lt;p&gt;This section outlines different failure modes and the best practices and DIY
(Do-It-Yourself) solutions used today. The next session will describe a roadmap
of improving things for those failure modes.&lt;/p&gt;
&lt;h3 id=&#34;failure-modes-k8s-infrastructure&#34;&gt;Failure modes: K8s infrastructure&lt;/h3&gt;
&lt;p&gt;In order to understand the failures related to the Kubernetes infrastructure,
you need to understand how many moving parts are involved in scheduling a Pod on
the node. The sequence of events when the Pod is scheduled in the Node is as
follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Device plugin&lt;/em&gt; is scheduled on the Node&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Device plugin&lt;/em&gt; is registered with the &lt;em&gt;kubelet&lt;/em&gt; via local gRPC&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Kubelet&lt;/em&gt; uses &lt;em&gt;device plugin&lt;/em&gt; to watch for devices and updates capacity of
the node&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Scheduler&lt;/em&gt; places a &lt;em&gt;user Pod&lt;/em&gt; on a Node based on the updated capacity&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Kubelet&lt;/em&gt; asks &lt;em&gt;Device plugin&lt;/em&gt; to &lt;strong&gt;Allocate&lt;/strong&gt; devices for a &lt;em&gt;User Pod&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Kubelet&lt;/em&gt; creates a &lt;em&gt;User Pod&lt;/em&gt; with the allocated devices attached to it&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This diagram shows some of those actors involved:&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/07/03/navigating-failures-in-pods-with-devices/k8s-infra-devices.svg&#34;
         alt=&#34;The diagram shows relationships between the kubelet, Device plugin, and a user Pod. It shows that kubelet connects to the Device plugin named my-device, kubelet reports the node status with the my-device availability, and the user Pod requesting the 2 of my-device.&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;As there are so many actors interconnected, every one of them and every
connection may experience interruptions. This leads to many exceptional
situations that are often considered failures, and may cause serious workload
interruptions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pods failing admission at various stages of its lifecycle&lt;/li&gt;
&lt;li&gt;Pods unable to run on perfectly fine hardware&lt;/li&gt;
&lt;li&gt;Scheduling taking unexpectedly long time&lt;/li&gt;
&lt;/ul&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/07/03/navigating-failures-in-pods-with-devices/k8s-infra-failures.svg&#34;
         alt=&#34;The same diagram as one above it, however it has an overlayed orange bang drawings over individual components with the text indicating what can break in that component. Over the kubelet text reads: &amp;#39;kubelet restart: looses all devices info before re-Watch&amp;#39;. Over the Device plugin text reads: &amp;#39;device plugin update, evictIon, restart: kubelet cannot Allocate devices or loses all devices state&amp;#39;. Over the user Pod text reads: &amp;#39;slow pod termination: devices are unavailable&amp;#39;.&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;The goal for Kubernetes is to make the interruption between these components as
reliable as possible. Kubelet already implements retries, grace periods, and
other techniques to improve it. The roadmap section goes into details on other
edge cases that the Kubernetes project tracks. However, all these improvements
only work when these best practices are followed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Configure and restart kubelet and the container runtime (such as containerd or CRI-O)
as early as possible to not interrupt the workload.&lt;/li&gt;
&lt;li&gt;Monitor device plugin health and carefully plan for upgrades.&lt;/li&gt;
&lt;li&gt;Do not overload the node with less-important workloads to prevent interruption
of device plugin and other components.&lt;/li&gt;
&lt;li&gt;Configure user pods tolerations to handle node readiness flakes.&lt;/li&gt;
&lt;li&gt;Configure and code graceful termination logic carefully to not block devices
for too long.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Another class of Kubernetes infra-related issues is driver-related. With
traditional resources like CPU and memory, no compatibility checks between the
application and hardware were needed. With special devices like hardware
accelerators, there are new failure modes. Device drivers installed on the node:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Must match the hardware&lt;/li&gt;
&lt;li&gt;Be compatible with an app&lt;/li&gt;
&lt;li&gt;Must work with other drivers (like &lt;a href=&#34;https://developer.nvidia.com/nccl&#34;&gt;nccl&lt;/a&gt;,
etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Best practices for handling driver versions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Monitor driver installer health&lt;/li&gt;
&lt;li&gt;Plan upgrades of infrastructure and Pods to match the version&lt;/li&gt;
&lt;li&gt;Have canary deployments whenever possible&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Following the best practices in this section and using device plugins and device
driver installers from trusted and reliable sources generally eliminate this
class of failures. Kubernetes is tracking work to make this space even better.&lt;/p&gt;
&lt;h3 id=&#34;failure-modes-device-failed&#34;&gt;Failure modes: device failed&lt;/h3&gt;
&lt;p&gt;There is very little handling of device failure in Kubernetes today. Device
plugins report the device failure only by changing the count of allocatable
devices. And Kubernetes relies on standard mechanisms like liveness probes or
container failures to allow Pods to communicate the failure condition to the
kubelet. However, Kubernetes does not correlate device failures with container
crashes and does not offer any mitigation beyond restarting the container while
being attached to the same device.&lt;/p&gt;
&lt;p&gt;This is why many plugins and DIY solutions exist to handle device failures based
on various signals.&lt;/p&gt;
&lt;h4 id=&#34;health-controller&#34;&gt;Health controller&lt;/h4&gt;
&lt;p&gt;In many cases a failed device will result in unrecoverable and very expensive
nodes doing nothing. A simple DIY solution is a &lt;em&gt;node health controller&lt;/em&gt;. The
controller could compare the device allocatable count with the capacity and if
the capacity is greater, it starts a timer. Once the timer reaches a threshold,
the health controller kills and recreates a node.&lt;/p&gt;
&lt;p&gt;There are problems with the &lt;em&gt;health controller&lt;/em&gt; approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Root cause of the device failure is typically not known&lt;/li&gt;
&lt;li&gt;The controller is not workload aware&lt;/li&gt;
&lt;li&gt;Failed device might not be in use and you want to keep other devices running&lt;/li&gt;
&lt;li&gt;The detection may be too slow as it is very generic&lt;/li&gt;
&lt;li&gt;The node may be part of a bigger set of nodes and simply cannot be deleted in
isolation without other nodes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are variations of the health controller solving some of the problems
above. The overall theme here though is that to best handle failed devices, you
need customized handling for the specific workload. Kubernetes doesn’t yet offer
enough abstraction to express how critical the device is for a node, for the
cluster, and for the Pod it is assigned to.&lt;/p&gt;
&lt;h4 id=&#34;pod-failure-policy&#34;&gt;Pod failure policy&lt;/h4&gt;
&lt;p&gt;Another DIY approach for device failure handling is a per-pod reaction on a
failed device. This approach is applicable for &lt;em&gt;training&lt;/em&gt; workloads that are
implemented as Jobs.&lt;/p&gt;
&lt;p&gt;Pod can define special error codes for device failures. For example, whenever
unexpected device behavior is encountered, Pod exits with a special exit code.
Then the Pod failure policy can handle the device failure in a special way. Read
more on &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/controllers/job/#pod-failure-policy&#34;&gt;Handling retriable and non-retriable pod failures with Pod failure
policy&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are some problems with the &lt;em&gt;Pod failure policy&lt;/em&gt; approach for Jobs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There is no well-known &lt;em&gt;device failed&lt;/em&gt; condition, so this approach does not work for the
generic Pod case&lt;/li&gt;
&lt;li&gt;Error codes must be coded carefully and in some cases are hard to guarantee.&lt;/li&gt;
&lt;li&gt;Only works with Jobs with &lt;code&gt;restartPolicy: Never&lt;/code&gt;, due to the limitation of a pod
failure policy feature.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, this solution has limited applicability.&lt;/p&gt;
&lt;h4 id=&#34;custom-pod-watcher&#34;&gt;Custom pod watcher&lt;/h4&gt;
&lt;p&gt;A little more generic approach is to implement the Pod watcher as a DIY solution
or use some third party tools offering this functionality. The pod watcher is
most often used to handle device failures for inference workloads.&lt;/p&gt;
&lt;p&gt;Since Kubernetes just keeps a pod assigned to a device, even if the device is
reportedly unhealthy, the idea is to detect this situation with the pod watcher
and apply some remediation. It often involves obtaining device health status and
its mapping to the Pod using Pod Resources API on the node. If a device fails,
it can then delete the attached Pod as a remediation. The replica set will
handle the Pod recreation on a healthy device.&lt;/p&gt;
&lt;p&gt;The other reasons to implement this watcher:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Without it, the Pod will keep being assigned to the failed device forever.&lt;/li&gt;
&lt;li&gt;There is no &lt;em&gt;descheduling&lt;/em&gt; for a pod with &lt;code&gt;restartPolicy=Always&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;There are no built-in controllers that delete Pods in CrashLoopBackoff.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Problems with the &lt;em&gt;custom pod watcher&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The signal for the pod watcher is expensive to get, and involves some
privileged actions.&lt;/li&gt;
&lt;li&gt;It is a custom solution and it assumes the importance of a device for a Pod.&lt;/li&gt;
&lt;li&gt;The pod watcher relies on external controllers to reschedule a Pod.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are more variations of DIY solutions for handling device failures or
upcoming maintenance. Overall, Kubernetes has enough extension points to
implement these solutions. However, some extension points require higher
privilege than users may be comfortable with or are too disruptive. The roadmap
section goes into more details on specific improvements in handling the device
failures.&lt;/p&gt;
&lt;h3 id=&#34;failure-modes-container-code-failed&#34;&gt;Failure modes: container code failed&lt;/h3&gt;
&lt;p&gt;When the container code fails or something bad happens with it, like out of
memory conditions, Kubernetes knows how to handle those cases. There is either
the restart of a container, or a crash of a Pod if it has &lt;code&gt;restartPolicy: Never&lt;/code&gt;
and scheduling it on another node. Kubernetes has limited expressiveness on what
is a failure (for example, non-zero exit code or liveness probe failure) and how
to react on such a failure (mostly either Always restart or immediately fail the
Pod).&lt;/p&gt;
&lt;p&gt;This level of expressiveness is often not enough for the complicated AI/ML
workloads. AI/ML pods are better rescheduled locally or even in-place as that
would save on image pulling time and device allocation. AI/ML pods are often
interconnected and need to be restarted together. This adds another level of
complexity and optimizing it often brings major savings in running AI/ML
workloads.&lt;/p&gt;
&lt;p&gt;There are various DIY solutions to handle Pod failures orchestration. The most
typical one is to wrap a main executable in a container by some orchestrator.
And this orchestrator will be able to restart the main executable whenever the
job needs to be restarted because some other pod has failed.&lt;/p&gt;
&lt;p&gt;Solutions like this are very fragile and elaborate. They are often worth the
money saved comparing to a regular JobSet delete/recreate cycle when used in
large training jobs. Making these solutions less fragile and more streamlined
by developing new hooks and extension points in Kubernetes will make it
easy to apply to smaller jobs, benefiting everybody.&lt;/p&gt;
&lt;h3 id=&#34;failure-modes-device-degradation&#34;&gt;Failure modes: device degradation&lt;/h3&gt;
&lt;p&gt;Not all device failures are terminal for the overall workload or batch job.
As the hardware stack gets more and more
complex, misconfiguration on one of the hardware stack layers, or driver
failures, may result in devices that are functional, but lagging on performance.
One device that is lagging behind can slow down the whole training job.&lt;/p&gt;
&lt;p&gt;We see reports of such cases more and more often. Kubernetes has no way to
express this type of failures today and since it is the newest type of failure
mode, there is not much of a best practice offered by hardware vendors for
detection and third party tooling for remediation of these situations.&lt;/p&gt;
&lt;p&gt;Typically, these failures are detected based on observed workload
characteristics. For example, the expected speed of AI/ML training steps on
particular hardware. Remediation for those issues is highly depend on a workload needs.&lt;/p&gt;
&lt;h2 id=&#34;roadmap&#34;&gt;Roadmap&lt;/h2&gt;
&lt;p&gt;As outlined in a section above, Kubernetes offers a lot of extension points
which are used to implement various DIY solutions. The space of AI/ML is
developing very fast, with changing requirements and usage patterns. SIG Node is
taking a measured approach of enabling more extension points to implement the
workload-specific scenarios over introduction of new semantics to support
specific scenarios. This means prioritizing making information about failures
readily available over implementing automatic remediations for those failures
that might only be suitable for a subset of workloads.&lt;/p&gt;
&lt;p&gt;This approach ensures there are no drastic changes for workload handling which
may break existing, well-oiled DIY solutions or experiences with the existing
more traditional workloads.&lt;/p&gt;
&lt;p&gt;Many error handling techniques used today work for AI/ML, but are very
expensive. SIG Node will invest in extension points to make those cheaper, with
the understanding that the price cutting for AI/ML is critical.&lt;/p&gt;
&lt;p&gt;The following is the set of specific investments we envision for various failure
modes.&lt;/p&gt;
&lt;h3 id=&#34;roadmap-for-failure-modes-k8s-infrastructure&#34;&gt;Roadmap for failure modes: K8s infrastructure&lt;/h3&gt;
&lt;p&gt;The area of Kubernetes infrastructure is the easiest to understand and very
important to make right for the upcoming transition from Device Plugins to DRA.
SIG Node is tracking many work items in this area, most notably the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/127460&#34;&gt;integrate kubelet with the systemd watchdog · Issue
#127460&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/128696&#34;&gt;DRA: detect stale DRA plugin sockets · Issue
#128696&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/127803&#34;&gt;Support takeover for devicemanager/device-plugin · Issue
#127803&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/127457&#34;&gt;Kubelet plugin registration reliability · Issue
#127457&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/128167&#34;&gt;Recreate the Device Manager gRPC server if failed · Issue
#128167&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/128043&#34;&gt;Retry pod admission on device plugin grpc failures · Issue
#128043&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Basically, every interaction of Kubernetes components must be reliable via
either the kubelet improvements or the best practices in plugins development
and deployment.&lt;/p&gt;
&lt;h3 id=&#34;roadmap-for-failure-modes-device-failed&#34;&gt;Roadmap for failure modes: device failed&lt;/h3&gt;
&lt;p&gt;For the device failures some patterns are already emerging in common scenarios
that Kubernetes can support. However, the very first step is to make information
about failed devices available easier. The very first step here is the work in
&lt;a href=&#34;https://kep.k8s.io/4680&#34;&gt;KEP 4680&lt;/a&gt; (Add Resource Health Status to the Pod Status for
Device Plugin and DRA).&lt;/p&gt;
&lt;p&gt;Longer term ideas include to be tested:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Integrate device failures into Pod Failure Policy.&lt;/li&gt;
&lt;li&gt;Node-local retry policies, enabling pod failure policies for Pods with
restartPolicy=OnFailure and possibly beyond that.&lt;/li&gt;
&lt;li&gt;Ability to &lt;em&gt;deschedule&lt;/em&gt; pod, including with the &lt;code&gt;restartPolicy: Always&lt;/code&gt;, so it can
get a new device allocated.&lt;/li&gt;
&lt;li&gt;Add device health to the ResourceSlice used to represent devices in DRA,
rather than simply withdrawing an unhealthy device from the ResourceSlice.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;roadmap-for-failure-modes-container-code-failed&#34;&gt;Roadmap for failure modes: container code failed&lt;/h3&gt;
&lt;p&gt;The main improvements to handle container code failures for AI/ML workloads are
all targeting cheaper error handling and recovery. The cheapness is mostly
coming from reuse of pre-allocated resources as much as possible. From reusing
the Pods by restarting containers in-place, to node local restart of containers
instead of rescheduling whenever possible, to snapshotting support, and
re-scheduling prioritizing the same node to save on image pulls.&lt;/p&gt;
&lt;p&gt;Consider this scenario: A big training job needs 512 Pods to run. And one of the
pods failed. It means that all Pods need to be interrupted and synced up to
restart the failed step. The most efficient way to achieve this generally is to
reuse as many Pods as possible by restarting them in-place, while replacing the
failed pod to clear up the error from it. Like demonstrated in this picture:&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/07/03/navigating-failures-in-pods-with-devices/inplace-pod-restarts.svg&#34;
         alt=&#34;The picture shows 512 pod, most ot them are green and have a recycle sign next to them indicating that they can be reused, and one Pod drawn in red, and a new green replacement Pod next to it indicating that it needs to be replaced.&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;It is possible to implement this scenario, but all solutions implementing it are
fragile due to lack of certain extension points in Kubernetes. Adding these
extension points to implement this scenario is on the Kubernetes roadmap.&lt;/p&gt;
&lt;h3 id=&#34;roadmap-for-failure-modes-device-degradation&#34;&gt;Roadmap for failure modes: device degradation&lt;/h3&gt;
&lt;p&gt;There is very little done in this area - there is no clear detection signal,
very limited troubleshooting tooling, and no built-in semantics to express the
&amp;quot;degraded&amp;quot; device on Kubernetes. There has been discussion of adding data on
device performance or degradation in the ResourceSlice used by DRA to represent
devices, but it is not yet clearly defined. There are also projects like
&lt;a href=&#34;https://github.com/medik8s/node-healthcheck-operator&#34;&gt;node-healthcheck-operator&lt;/a&gt;
that can be used for some scenarios.&lt;/p&gt;
&lt;p&gt;We expect developments in this area from hardware vendors and cloud providers, and we expect to see mostly DIY
solutions in the near future. As more users get exposed to AI/ML workloads, this
is a space needing feedback on patterns used here.&lt;/p&gt;
&lt;h2 id=&#34;join-the-conversation&#34;&gt;Join the conversation&lt;/h2&gt;
&lt;p&gt;The Kubernetes community encourages feedback and participation in shaping the
future of device failure handling. Join SIG Node and contribute to the ongoing
discussions!&lt;/p&gt;
&lt;p&gt;This blog post provides a high-level overview of the challenges and future
directions for device failure management in Kubernetes. By addressing these
issues, Kubernetes can solidify its position as the leading platform for AI/ML
workloads, ensuring resilience and reliability for applications that depend on
specialized hardware.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Image Compatibility In Cloud Native Environments</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/25/image-compatibility-in-cloud-native-environments/</link>
      <pubDate>Wed, 25 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/25/image-compatibility-in-cloud-native-environments/</guid>
      <description>
        
        
        &lt;p&gt;In industries where systems must run very reliably and meet strict performance criteria such as telecommunication, high-performance or AI computing, containerized applications often need specific operating system configuration or hardware presence.
It is common practice to require the use of specific versions of the kernel, its configuration, device drivers, or system components.
Despite the existence of the &lt;a href=&#34;https://opencontainers.org/&#34;&gt;Open Container Initiative (OCI)&lt;/a&gt;, a governing community to define standards and specifications for container images, there has been a gap in expression of such compatibility requirements.
The need to address this issue has led to different proposals and, ultimately, an implementation in Kubernetes&#39; &lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html&#34;&gt;Node Feature Discovery (NFD)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html&#34;&gt;NFD&lt;/a&gt; is an open source Kubernetes project that automatically detects and reports &lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#available-features&#34;&gt;hardware and system features&lt;/a&gt; of cluster nodes. This information helps users to schedule workloads on nodes that meet specific system requirements, which is especially useful for applications with strict hardware or operating system dependencies.&lt;/p&gt;
&lt;h2 id=&#34;the-need-for-image-compatibility-specification&#34;&gt;The need for image compatibility specification&lt;/h2&gt;
&lt;h3 id=&#34;dependencies-between-containers-and-host-os&#34;&gt;Dependencies between containers and host OS&lt;/h3&gt;
&lt;p&gt;A container image is built on a base image, which provides a minimal runtime environment, often a stripped-down Linux userland, completely empty or distroless. When an application requires certain features from the host OS, compatibility issues arise. These dependencies can manifest in several ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Drivers&lt;/strong&gt;:
Host driver versions must match the supported range of a library version inside the container to avoid compatibility problems. Examples include GPUs and network drivers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Libraries or Software&lt;/strong&gt;:
The container must come with a specific version or range of versions for a library or software to run optimally in the environment. Examples from high performance computing are MPI, EFA, or Infiniband.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kernel Modules or Features&lt;/strong&gt;:
Specific kernel features or modules must be present. Examples include having support of write protected huge page faults, or the presence of VFIO&lt;/li&gt;
&lt;li&gt;And more…&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While containers in Kubernetes are the most likely unit of abstraction for these needs, the definition of compatibility can extend further to include other container technologies such as Singularity and other OCI artifacts such as binaries from a spack binary cache.&lt;/p&gt;
&lt;h3 id=&#34;multi-cloud-and-hybrid-cloud-challenges&#34;&gt;Multi-cloud and hybrid cloud challenges&lt;/h3&gt;
&lt;p&gt;Containerized applications are deployed across various Kubernetes distributions and cloud providers, where different host operating systems introduce compatibility challenges.
Often those have to be pre-configured before workload deployment or are immutable.
For instance, different cloud providers will include different operating systems like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;RHCOS/RHEL&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Photon OS&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Amazon Linux 2&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Container-Optimized OS&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Azure Linux OS&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;And more...&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each OS comes with unique kernel versions, configurations, and drivers, making compatibility a non-trivial issue for applications requiring specific features.
It must be possible to quickly assess a container for its suitability to run on any specific environment.&lt;/p&gt;
&lt;h3 id=&#34;image-compatibility-initiative&#34;&gt;Image compatibility initiative&lt;/h3&gt;
&lt;p&gt;An effort was made within the &lt;a href=&#34;https://github.com/opencontainers/wg-image-compatibility&#34;&gt;Open Containers Initiative Image Compatibility&lt;/a&gt; working group to introduce a standard for image compatibility metadata.
A specification for compatibility would allow container authors to declare required host OS features, making compatibility requirements discoverable and programmable.
The specification implemented in Kubernetes Node Feature Discovery is one of the discussed proposals.
It aims to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Define a structured way to express compatibility in OCI image manifests.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Support a compatibility specification alongside container images in image registries.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Allow automated validation of compatibility before scheduling containers.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The concept has since been implemented in the Kubernetes Node Feature Discovery project.&lt;/p&gt;
&lt;h3 id=&#34;implementation-in-node-feature-discovery&#34;&gt;Implementation in Node Feature Discovery&lt;/h3&gt;
&lt;p&gt;The solution integrates compatibility metadata into Kubernetes via NFD features and the &lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup&#34;&gt;NodeFeatureGroup&lt;/a&gt; API.
This interface enables the user to match containers to nodes based on exposing features of hardware and software, allowing for intelligent scheduling and workload optimization.&lt;/p&gt;
&lt;h3 id=&#34;compatibility-specification&#34;&gt;Compatibility specification&lt;/h3&gt;
&lt;p&gt;The compatibility specification is a structured list of compatibility objects containing &lt;em&gt;&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup&#34;&gt;Node Feature Groups&lt;/a&gt;&lt;/em&gt;.
These objects define image requirements and facilitate validation against host nodes.
The feature requirements are described by using &lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#available-features&#34;&gt;the list of available features&lt;/a&gt; from the NFD project.
The schema has the following structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;version&lt;/strong&gt; (string) - Specifies the API version.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;compatibilities&lt;/strong&gt; (array of objects) - List of compatibility sets.
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;rules&lt;/strong&gt; (object) - Specifies &lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup&#34;&gt;NodeFeatureGroup&lt;/a&gt; to define image requirements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;weight&lt;/strong&gt; (int, optional) - Node affinity weight.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;tag&lt;/strong&gt; (string, optional) - Categorization tag.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;description&lt;/strong&gt; (string, optional) - Short description.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An example might look like the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;version&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1alpha1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;compatibilities&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;description&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;My image requirements&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;kernel and cpu&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchFeatures&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;feature&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kernel.loadedmodule&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchExpressions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;vfio-pci&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Exists}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;feature&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cpu.model&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchExpressions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;vendor_id&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Intel&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;AMD&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;one of available nics&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchAny&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchFeatures&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;feature&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;pci.device&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchExpressions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;vendor&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0eee&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;class&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0200&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchFeatures&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;feature&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;pci.device&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchExpressions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;vendor&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0fff&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;class&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0200&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;client-implementation-for-node-validation&#34;&gt;Client implementation for node validation&lt;/h3&gt;
&lt;p&gt;To streamline compatibility validation, we implemented a &lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/reference/node-feature-client-reference.html&#34;&gt;client tool&lt;/a&gt; that allows for node validation based on an image&#39;s compatibility artifact.
In this workflow, the image author would generate a compatibility artifact that points to the image it describes in a registry via the referrers API.
When a need arises to assess the fit of an image to a host, the tool can discover the artifact and verify compatibility of an image to a node before deployment.
The client can validate nodes both inside and outside a Kubernetes cluster, extending the utility of the tool beyond the single Kubernetes use case.
In the future, image compatibility could play a crucial role in creating specific workload profiles based on image compatibility requirements, aiding in more efficient scheduling.
Additionally, it could potentially enable automatic node configuration to some extent, further optimizing resource allocation and ensuring seamless deployment of specialized workloads.&lt;/p&gt;
&lt;h3 id=&#34;examples-of-usage&#34;&gt;Examples of usage&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Define image compatibility metadata&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/containers/images/&#34;&gt;container image&lt;/a&gt; can have metadata that describes
its requirements based on features discovered from nodes, like kernel modules or CPU models.
The previous compatibility specification example in this article exemplified this use case.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Attach the artifact to the image&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The image compatibility specification is stored as an OCI artifact.
You can attach this metadata to your container image using the &lt;a href=&#34;https://oras.land/&#34;&gt;oras&lt;/a&gt; tool.
The registry only needs to support OCI artifacts, support for arbitrary types is not required.
Keep in mind that the container image and the artifact must be stored in the same registry.
Use the following command to attach the artifact to the image:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;oras attach &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;--artifact-type application/vnd.nfd.image-compatibility.v1alpha1 &amp;lt;image-url&amp;gt; &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\ &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;path-to-spec&amp;gt;.yaml:application/vnd.nfd.image-compatibility.spec.v1alpha1+yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Validate image compatibility&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;After attaching the compatibility specification, you can validate whether a node meets the
image&#39;s requirements. This validation can be done using the
&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/reference/node-feature-client-reference.html&#34;&gt;nfd client&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nfd compat validate-node --image &amp;lt;image-url&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Read the output from the client&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Finally you can read the report generated by the tool or use your own tools to act based on the generated JSON report.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;validate-node command output&#34; src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/25/image-compatibility-in-cloud-native-environments/validate-node-output.png&#34;&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The addition of image compatibility to Kubernetes through Node Feature Discovery underscores the growing importance of addressing compatibility in cloud native environments.
It is only a start, as further work is needed to integrate compatibility into scheduling of workloads within and outside of Kubernetes.
However, by integrating this feature into Kubernetes, mission-critical workloads can now define and validate host OS requirements more efficiently.
Moving forward, the adoption of compatibility metadata within Kubernetes ecosystems will significantly enhance the reliability and performance of specialized containerized applications, ensuring they meet the stringent requirements of industries like telecommunications, high-performance computing or any environment that requires special hardware or host OS configuration.&lt;/p&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;Get involved&lt;/h2&gt;
&lt;p&gt;Join the &lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/contributing/&#34;&gt;Kubernetes Node Feature Discovery&lt;/a&gt; project if you&#39;re interested in getting involved with the design and development of Image Compatibility API and tools.
We always welcome new contributors.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Changes to Kubernetes Slack</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/16/changes-to-kubernetes-slack/</link>
      <pubDate>Mon, 16 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/16/changes-to-kubernetes-slack/</guid>
      <description>
        
        
        &lt;p&gt;&lt;strong&gt;UPDATE&lt;/strong&gt;: We’ve received notice from Salesforce that our Slack workspace &lt;strong&gt;WILL NOT BE DOWNGRADED&lt;/strong&gt; on June 20th. Stand by for more details, but for now, there is no urgency to back up private channels or direct messages.&lt;/p&gt;
&lt;p&gt;&lt;del&gt;Kubernetes Slack will lose its special status and will be changing into a standard free Slack on June 20, 2025&lt;/del&gt;. Sometime later this year, our community may move to a new platform. If you are responsible for a channel or private channel, or a member of a User Group, you will need to take some actions as soon as you can.&lt;/p&gt;
&lt;p&gt;For the last decade, Slack has supported our project with a free customized enterprise account. They have let us know that they can no longer do so, particularly since our Slack is one of the largest and more active ones on the platform. As such, they will be downgrading it to a standard free Slack while we decide on, and implement, other options.&lt;/p&gt;
&lt;p&gt;On Friday, June 20, we will be subject to the &lt;a href=&#34;https://slack.com/help/articles/27204752526611-Feature-limitations-on-the-free-version-of-Slack&#34;&gt;feature limitations of free Slack&lt;/a&gt;. The primary ones which will affect us will be only retaining 90 days of history, and having to disable several apps and workflows which we are currently using. The Slack Admin team will do their best to manage these limitations.&lt;/p&gt;
&lt;p&gt;Responsible channel owners, members of private channels, and members of User Groups should &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/communication/slack-migration-faq.md#what-actions-do-channel-owners-and-user-group-members-need-to-take-soon&#34;&gt;take some actions&lt;/a&gt; to prepare for the upgrade and preserve information as soon as possible.&lt;/p&gt;
&lt;p&gt;The CNCF Projects Staff have proposed that our community look at migrating to Discord. Because of existing issues where we have been pushing the limits of Slack, they have already explored what a Kubernetes Discord would look like. Discord would allow us to implement new tools and integrations which would help the community, such as GitHub group membership synchronization. The Steering Committee will discuss and decide on our future platform.&lt;/p&gt;
&lt;p&gt;Please see our &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/communication/slack-migration-faq.md&#34;&gt;FAQ&lt;/a&gt;, and check the &lt;a href=&#34;https://groups.google.com/a/kubernetes.io/g/dev/&#34;&gt;kubernetes-dev mailing list&lt;/a&gt; and the &lt;a href=&#34;https://kubernetes.slack.com/archives/C9T0QMNG4&#34;&gt;#announcements channel&lt;/a&gt; for further news. If you have specific feedback on our Slack status join the &lt;a href=&#34;https://github.com/kubernetes/community/issues/8490&#34;&gt;discussion on GitHub&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Enhancing Kubernetes Event Management with Custom Aggregation</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/10/enhancing-kubernetes-event-management-custom-aggregation/</link>
      <pubDate>Tue, 10 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/10/enhancing-kubernetes-event-management-custom-aggregation/</guid>
      <description>
        
        
        &lt;p&gt;Kubernetes &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/kubernetes-api/cluster-resources/event-v1/&#34;&gt;Events&lt;/a&gt; provide crucial insights into cluster operations, but as clusters grow, managing and analyzing these events becomes increasingly challenging. This blog post explores how to build custom event aggregation systems that help engineering teams better understand cluster behavior and troubleshoot issues more effectively.&lt;/p&gt;
&lt;h2 id=&#34;the-challenge-with-kubernetes-events&#34;&gt;The challenge with Kubernetes events&lt;/h2&gt;
&lt;p&gt;In a Kubernetes cluster, events are generated for various operations - from pod scheduling and container starts to volume mounts and network configurations. While these events are invaluable for debugging and monitoring, several challenges emerge in production environments:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Volume&lt;/strong&gt;: Large clusters can generate thousands of events per minute&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retention&lt;/strong&gt;: Default event retention is limited to one hour&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Correlation&lt;/strong&gt;: Related events from different components are not automatically linked&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Classification&lt;/strong&gt;: Events lack standardized severity or category classifications&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Aggregation&lt;/strong&gt;: Similar events are not automatically grouped&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To learn more about Events in Kubernetes, read the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/kubernetes-api/cluster-resources/event-v1/&#34;&gt;Event&lt;/a&gt; API reference.&lt;/p&gt;
&lt;h2 id=&#34;real-world-value&#34;&gt;Real-World value&lt;/h2&gt;
&lt;p&gt;Consider a production environment with tens of microservices where the users report intermittent transaction failures:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Traditional event aggregation process:&lt;/strong&gt; Engineers are wasting hours sifting through thousands of standalone events spread across namespaces. By the time they look into it, the older events have long since purged, and correlating pod restarts to node-level issues is practically impossible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;With its event aggregation in its custom events:&lt;/strong&gt; The system groups events across resources, instantly surfacing correlation patterns such as volume mount timeouts before pod restarts. History indicates it occurred during past record traffic spikes, highlighting a storage scalability issue in minutes rather than hours.&lt;/p&gt;
&lt;p&gt;The beneﬁt of this approach is that organizations that implement it commonly cut down their troubleshooting time significantly along with increasing the reliability of systems by detecting patterns early.&lt;/p&gt;
&lt;h2 id=&#34;building-an-event-aggregation-system&#34;&gt;Building an Event aggregation system&lt;/h2&gt;
&lt;p&gt;This post explores how to build a custom event aggregation system that addresses these challenges, aligned to Kubernetes best practices. I&#39;ve picked the Go programming language for my example.&lt;/p&gt;
&lt;h3 id=&#34;architecture-overview&#34;&gt;Architecture overview&lt;/h3&gt;
&lt;p&gt;This event aggregation system consists of three main components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Event Watcher&lt;/strong&gt;: Monitors the Kubernetes API for new events&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Event Processor&lt;/strong&gt;: Processes, categorizes, and correlates events&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Storage Backend&lt;/strong&gt;: Stores processed events for longer retention&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Here&#39;s a sketch for how to implement the event watcher:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;package&lt;/span&gt; main
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;import&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;context&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    metav1 &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;k8s.io/apimachinery/pkg/apis/meta/v1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;k8s.io/client-go/kubernetes&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;k8s.io/client-go/rest&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    eventsv1 &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;k8s.io/api/events/v1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; EventWatcher &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    clientset &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;kubernetes.Clientset
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;NewEventWatcher&lt;/span&gt;(config &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;rest.Config) (&lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;EventWatcher, &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    clientset, err &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; kubernetes.&lt;span style=&#34;color:#00a000&#34;&gt;NewForConfig&lt;/span&gt;(config)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; err &lt;span style=&#34;color:#666&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt;, err
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;&amp;amp;&lt;/span&gt;EventWatcher{clientset: clientset}, &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (w &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;EventWatcher) &lt;span style=&#34;color:#00a000&#34;&gt;Watch&lt;/span&gt;(ctx context.Context) (&lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;chan&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event, &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    events &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;make&lt;/span&gt;(&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;chan&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    watcher, err &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; w.clientset.&lt;span style=&#34;color:#00a000&#34;&gt;EventsV1&lt;/span&gt;().&lt;span style=&#34;color:#00a000&#34;&gt;Events&lt;/span&gt;(&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;).&lt;span style=&#34;color:#00a000&#34;&gt;Watch&lt;/span&gt;(ctx, metav1.ListOptions{})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; err &lt;span style=&#34;color:#666&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt;, err
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;go&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;defer&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;close&lt;/span&gt;(events)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;for&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;select&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;case&lt;/span&gt; event &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt;watcher.&lt;span style=&#34;color:#00a000&#34;&gt;ResultChan&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; e, ok &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; event.Object.(&lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event); ok {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    events &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; e
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;case&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt;ctx.&lt;span style=&#34;color:#00a000&#34;&gt;Done&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                watcher.&lt;span style=&#34;color:#00a000&#34;&gt;Stop&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; events, &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;event-processing-and-classification&#34;&gt;Event processing and classification&lt;/h3&gt;
&lt;p&gt;The event processor enriches events with additional context and classification:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; EventProcessor &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    categoryRules []CategoryRule
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    correlationRules []CorrelationRule
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; ProcessedEvent &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Event     &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Category  &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Severity  &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    CorrelationID &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Metadata  &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;]&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (p &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;EventProcessor) &lt;span style=&#34;color:#00a000&#34;&gt;Process&lt;/span&gt;(event &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event) &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;ProcessedEvent {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    processed &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;&amp;amp;&lt;/span&gt;ProcessedEvent{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        Event:    event,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        Metadata: &lt;span style=&#34;color:#a2f&#34;&gt;make&lt;/span&gt;(&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;]&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Apply classification rules
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    processed.Category = p.&lt;span style=&#34;color:#00a000&#34;&gt;classifyEvent&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    processed.Severity = p.&lt;span style=&#34;color:#00a000&#34;&gt;determineSeverity&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Generate correlation ID for related events
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    processed.CorrelationID = p.&lt;span style=&#34;color:#00a000&#34;&gt;correlateEvent&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Add useful metadata
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    processed.Metadata = p.&lt;span style=&#34;color:#00a000&#34;&gt;extractMetadata&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; processed
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;implementing-event-correlation&#34;&gt;Implementing Event correlation&lt;/h3&gt;
&lt;p&gt;One of the key features you could implement is a way of correlating related Events.
Here&#39;s an example correlation strategy:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (p &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;EventProcessor) &lt;span style=&#34;color:#00a000&#34;&gt;correlateEvent&lt;/span&gt;(event &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event) &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Correlation strategies:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 1. Time-based: Events within a time window
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 2. Resource-based: Events affecting the same resource
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 3. Causation-based: Events with cause-effect relationships
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    correlationKey &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;generateCorrelationKey&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; correlationKey
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;generateCorrelationKey&lt;/span&gt;(event &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event) &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Example: Combine namespace, resource type, and name
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; fmt.&lt;span style=&#34;color:#00a000&#34;&gt;Sprintf&lt;/span&gt;(&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;%s/%s/%s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        event.InvolvedObject.Namespace,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        event.InvolvedObject.Kind,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        event.InvolvedObject.Name,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;event-storage-and-retention&#34;&gt;Event storage and retention&lt;/h2&gt;
&lt;p&gt;For long-term storage and analysis, you&#39;ll probably want a backend that supports:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Efficient querying of large event volumes&lt;/li&gt;
&lt;li&gt;Flexible retention policies&lt;/li&gt;
&lt;li&gt;Support for aggregation queries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here&#39;s a sample storage interface:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; EventStorage &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;interface&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#00a000&#34;&gt;Store&lt;/span&gt;(context.Context, &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;ProcessedEvent) &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#00a000&#34;&gt;Query&lt;/span&gt;(context.Context, EventQuery) ([]ProcessedEvent, &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#00a000&#34;&gt;Aggregate&lt;/span&gt;(context.Context, AggregationParams) ([]EventAggregate, &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; EventQuery &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    TimeRange     TimeRange
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Categories    []&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Severity      []&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    CorrelationID &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Limit         &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;int&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; AggregationParams &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    GroupBy    []&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    TimeWindow &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Metrics    []&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;good-practices-for-event-management&#34;&gt;Good practices for Event management&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Resource Efficiency&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implement rate limiting for event processing&lt;/li&gt;
&lt;li&gt;Use efficient filtering at the API server level&lt;/li&gt;
&lt;li&gt;Batch events for storage operations&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distribute event processing across multiple workers&lt;/li&gt;
&lt;li&gt;Use leader election for coordination&lt;/li&gt;
&lt;li&gt;Implement backoff strategies for API rate limits&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reliability&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Handle API server disconnections gracefully&lt;/li&gt;
&lt;li&gt;Buffer events during storage backend unavailability&lt;/li&gt;
&lt;li&gt;Implement retry mechanisms with exponential backoff&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;advanced-features&#34;&gt;Advanced features&lt;/h2&gt;
&lt;h3 id=&#34;pattern-detection&#34;&gt;Pattern detection&lt;/h3&gt;
&lt;p&gt;Implement pattern detection to identify recurring issues:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; PatternDetector &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    patterns &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;]&lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;Pattern
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    threshold &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;int&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (d &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;PatternDetector) &lt;span style=&#34;color:#00a000&#34;&gt;Detect&lt;/span&gt;(events []ProcessedEvent) []Pattern {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Group similar events
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    groups &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;groupSimilarEvents&lt;/span&gt;(events)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Analyze frequency and timing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    patterns &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;identifyPatterns&lt;/span&gt;(groups)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; patterns
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;groupSimilarEvents&lt;/span&gt;(events []ProcessedEvent) &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;][]ProcessedEvent {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    groups &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;make&lt;/span&gt;(&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;][]ProcessedEvent)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;for&lt;/span&gt; _, event &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;range&lt;/span&gt; events {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Create similarity key based on event characteristics
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        similarityKey &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; fmt.&lt;span style=&#34;color:#00a000&#34;&gt;Sprintf&lt;/span&gt;(&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;%s:%s:%s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            event.Event.Reason,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            event.Event.InvolvedObject.Kind,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            event.Event.InvolvedObject.Namespace,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Group events with the same key
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        groups[similarityKey] = &lt;span style=&#34;color:#a2f&#34;&gt;append&lt;/span&gt;(groups[similarityKey], event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; groups
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;identifyPatterns&lt;/span&gt;(groups &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;][]ProcessedEvent) []Pattern {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;var&lt;/span&gt; patterns []Pattern
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;for&lt;/span&gt; key, events &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;range&lt;/span&gt; groups {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Only consider groups with enough events to form a pattern
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events) &amp;lt; &lt;span style=&#34;color:#666&#34;&gt;3&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;continue&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Sort events by time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        sort.&lt;span style=&#34;color:#00a000&#34;&gt;Slice&lt;/span&gt;(events, &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt;(i, j &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;int&lt;/span&gt;) &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;bool&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; events[i].Event.LastTimestamp.Time.&lt;span style=&#34;color:#00a000&#34;&gt;Before&lt;/span&gt;(events[j].Event.LastTimestamp.Time)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        })
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Calculate time range and frequency
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        firstSeen &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; events[&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;].Event.FirstTimestamp.Time
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        lastSeen &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; events[&lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events)&lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;].Event.LastTimestamp.Time
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        duration &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; lastSeen.&lt;span style=&#34;color:#00a000&#34;&gt;Sub&lt;/span&gt;(firstSeen).&lt;span style=&#34;color:#00a000&#34;&gt;Minutes&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;var&lt;/span&gt; frequency &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;float64&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; duration &amp;gt; &lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            frequency = &lt;span style=&#34;color:#a2f&#34;&gt;float64&lt;/span&gt;(&lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events)) &lt;span style=&#34;color:#666&#34;&gt;/&lt;/span&gt; duration
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Create a pattern if it meets threshold criteria
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; frequency &amp;gt; &lt;span style=&#34;color:#666&#34;&gt;0.5&lt;/span&gt; { &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// More than 1 event per 2 minutes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;            pattern &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; Pattern{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                Type:         key,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                Count:        &lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                FirstSeen:    firstSeen,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                LastSeen:     lastSeen,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                Frequency:    frequency,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                EventSamples: events[:&lt;span style=&#34;color:#a2f&#34;&gt;min&lt;/span&gt;(&lt;span style=&#34;color:#666&#34;&gt;3&lt;/span&gt;, &lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events))], &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Keep up to 3 samples
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;            }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            patterns = &lt;span style=&#34;color:#a2f&#34;&gt;append&lt;/span&gt;(patterns, pattern)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; patterns
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With this implementation, the system can identify recurring patterns such as node pressure events, pod scheduling failures, or networking issues that occur with a specific frequency.&lt;/p&gt;
&lt;h3 id=&#34;real-time-alerts&#34;&gt;Real-time alerts&lt;/h3&gt;
&lt;p&gt;The following example provides a starting point for building an alerting system based on event patterns. It is not a complete solution but a conceptual sketch to illustrate the approach.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; AlertManager &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    rules []AlertRule
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    notifiers []Notifier
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (a &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;AlertManager) &lt;span style=&#34;color:#00a000&#34;&gt;EvaluateEvents&lt;/span&gt;(events []ProcessedEvent) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;for&lt;/span&gt; _, rule &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;range&lt;/span&gt; a.rules {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; rule.&lt;span style=&#34;color:#00a000&#34;&gt;Matches&lt;/span&gt;(events) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            alert &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; rule.&lt;span style=&#34;color:#00a000&#34;&gt;GenerateAlert&lt;/span&gt;(events)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            a.&lt;span style=&#34;color:#00a000&#34;&gt;notify&lt;/span&gt;(alert)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;A well-designed event aggregation system can significantly improve cluster observability and troubleshooting capabilities. By implementing custom event processing, correlation, and storage, operators can better understand cluster behavior and respond to issues more effectively.&lt;/p&gt;
&lt;p&gt;The solutions presented here can be extended and customized based on specific requirements while maintaining compatibility with the Kubernetes API and following best practices for scalability and reliability.&lt;/p&gt;
&lt;h2 id=&#34;next-steps&#34;&gt;Next steps&lt;/h2&gt;
&lt;p&gt;Future enhancements could include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Machine learning for anomaly detection&lt;/li&gt;
&lt;li&gt;Integration with popular observability platforms&lt;/li&gt;
&lt;li&gt;Custom event APIs for application-specific events&lt;/li&gt;
&lt;li&gt;Enhanced visualization and reporting capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information on Kubernetes events and custom &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/architecture/controller/&#34;&gt;controllers&lt;/a&gt;,
refer to the official Kubernetes &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/&#34;&gt;documentation&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Introducing Gateway API Inference Extension</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/05/introducing-gateway-api-inference-extension/</link>
      <pubDate>Thu, 05 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/05/introducing-gateway-api-inference-extension/</guid>
      <description>
        
        
        &lt;p&gt;Modern generative AI and large language model (LLM) services create unique traffic-routing challenges
on Kubernetes. Unlike typical short-lived, stateless web requests, LLM inference sessions are often
long-running, resource-intensive, and partially stateful. For example, a single GPU-backed model server
may keep multiple inference sessions active and maintain in-memory token caches.&lt;/p&gt;
&lt;p&gt;Traditional load balancers focused on HTTP path or round-robin lack the specialized capabilities needed
for these workloads. They also don’t account for model identity or request criticality (e.g., interactive
chat vs. batch jobs). Organizations often patch together ad-hoc solutions, but a standardized approach
is missing.&lt;/p&gt;
&lt;h2 id=&#34;gateway-api-inference-extension&#34;&gt;Gateway API Inference Extension&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/&#34;&gt;Gateway API Inference Extension&lt;/a&gt; was created to address
this gap by building on the existing &lt;a href=&#34;https://gateway-api.sigs.k8s.io/&#34;&gt;Gateway API&lt;/a&gt;, adding inference-specific
routing capabilities while retaining the familiar model of Gateways and HTTPRoutes. By adding an inference
extension to your existing gateway, you effectively transform it into an &lt;strong&gt;Inference Gateway&lt;/strong&gt;, enabling you to
self-host GenAI/LLMs with a “model-as-a-service” mindset.&lt;/p&gt;
&lt;p&gt;The project’s goal is to improve and standardize routing to inference workloads across the ecosystem. Key
objectives include enabling model-aware routing, supporting per-request criticalities, facilitating safe model
roll-outs, and optimizing load balancing based on real-time model metrics. By achieving these, the project aims
to reduce latency and improve accelerator (GPU) utilization for AI workloads.&lt;/p&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;How it works&lt;/h2&gt;
&lt;p&gt;The design introduces two new Custom Resources (CRDs) with distinct responsibilities, each aligning with a
specific user persona in the AI/ML serving workflow​:&lt;/p&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-resource-model.png&#34;
         alt=&#34;Resource Model&#34;/&gt; 
&lt;/figure&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencepool/&#34;&gt;InferencePool&lt;/a&gt;
Defines a pool of pods (model servers) running on shared compute (e.g., GPU nodes). The platform admin can
configure how these pods are deployed, scaled, and balanced. An InferencePool ensures consistent resource
usage and enforces platform-wide policies. An InferencePool is similar to a Service but specialized for AI/ML
serving needs and aware of the model-serving protocol.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencemodel/&#34;&gt;InferenceModel&lt;/a&gt;
A user-facing model endpoint managed by AI/ML owners. It maps a public name (e.g., &amp;quot;gpt-4-chat&amp;quot;) to the actual
model within an InferencePool. This lets workload owners specify which models (and optional fine-tuning) they
want served, plus a traffic-splitting or prioritization policy.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In summary, the InferenceModel API lets AI/ML owners manage what is served, while the InferencePool lets platform
operators manage where and how it’s served.&lt;/p&gt;
&lt;h2 id=&#34;request-flow&#34;&gt;Request flow&lt;/h2&gt;
&lt;p&gt;The flow of a request builds on the Gateway API model (Gateways and HTTPRoutes) with one or more extra inference-aware
steps (extensions) in the middle. Here’s a high-level example of the request flow with the
&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/#endpoint-selection-extension&#34;&gt;Endpoint Selection Extension (ESE)&lt;/a&gt;:&lt;/p&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-request-flow.png&#34;
         alt=&#34;Request Flow&#34;/&gt; 
&lt;/figure&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Gateway Routing&lt;/strong&gt;&lt;br&gt;
A client sends a request (e.g., an HTTP POST to /completions). The Gateway (like Envoy) examines the HTTPRoute
and identifies the matching InferencePool backend.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Endpoint Selection&lt;/strong&gt;&lt;br&gt;
Instead of simply forwarding to any available pod, the Gateway consults an inference-specific routing extension—
the Endpoint Selection Extension—to pick the best of the available pods. This extension examines live pod metrics
(queue lengths, memory usage, loaded adapters) to choose the ideal pod for the request.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Inference-Aware Scheduling&lt;/strong&gt;&lt;br&gt;
The chosen pod is the one that can handle the request with the lowest latency or highest efficiency, given the
user’s criticality or resource needs. The Gateway then forwards traffic to that specific pod.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-epp-scheduling.png&#34;
         alt=&#34;Endpoint Extension Scheduling&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;This extra step provides a smarter, model-aware routing mechanism that still feels like a normal single request to
the client. Additionally, the design is extensible—any Inference Gateway can be enhanced with additional inference-specific
extensions to handle new routing strategies, advanced scheduling logic, or specialized hardware needs. As the project
continues to grow, contributors are encouraged to develop new extensions that are fully compatible with the same underlying
Gateway API model, further expanding the possibilities for efficient and intelligent GenAI/LLM routing.&lt;/p&gt;
&lt;h2 id=&#34;benchmarks&#34;&gt;Benchmarks&lt;/h2&gt;
&lt;p&gt;We evaluated ​this extension against a standard Kubernetes Service for a &lt;a href=&#34;https://docs.vllm.ai/en/latest/&#34;&gt;vLLM&lt;/a&gt;‐based model
serving deployment. The test environment consisted of multiple H100 (80 GB) GPU pods running vLLM (&lt;a href=&#34;https://blog.vllm.ai/2025/01/27/v1-alpha-release.html&#34;&gt;version 1&lt;/a&gt;)
on a Kubernetes cluster, with 10 Llama2 model replicas. The &lt;a href=&#34;https://github.com/AI-Hypercomputer/inference-benchmark&#34;&gt;Latency Profile Generator (LPG)&lt;/a&gt;
tool was used to generate traffic and measure throughput, latency, and other metrics. The
&lt;a href=&#34;https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json&#34;&gt;ShareGPT&lt;/a&gt;
dataset served as the workload, and traffic was ramped from 100 Queries per Second (QPS) up to 1000 QPS.&lt;/p&gt;
&lt;h3 id=&#34;key-results&#34;&gt;Key results&lt;/h3&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-benchmark.png&#34;
         alt=&#34;Endpoint Extension Scheduling&#34;/&gt; 
&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Comparable Throughput&lt;/strong&gt;: Throughout the tested QPS range, the ESE delivered throughput roughly on par with a standard
Kubernetes Service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Lower Latency&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Per‐Output‐Token Latency&lt;/strong&gt;: The ​ESE showed significantly lower p90 latency at higher QPS (500+), indicating that
its model-aware routing decisions reduce queueing and resource contention as GPU memory approaches saturation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Overall p90 Latency&lt;/strong&gt;: Similar trends emerged, with the ​ESE reducing end‐to‐end tail latencies compared to the
baseline, particularly as traffic increased beyond 400–500 QPS.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These results suggest that this extension&#39;s model‐aware routing significantly reduced latency for GPU‐backed LLM
workloads. By dynamically selecting the least‐loaded or best‐performing model server, it avoids hotspots that can
appear when using traditional load balancing methods for large, long‐running inference requests.&lt;/p&gt;
&lt;h2 id=&#34;roadmap&#34;&gt;Roadmap&lt;/h2&gt;
&lt;p&gt;As the Gateway API Inference Extension heads toward GA, planned features include:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Prefix-cache aware load balancing&lt;/strong&gt; for remote caches&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LoRA adapter pipelines&lt;/strong&gt; for automated rollout&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fairness and priority&lt;/strong&gt; between workloads in the same criticality band&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HPA support&lt;/strong&gt; for scaling based on aggregate, per-model metrics&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Support for large multi-modal inputs/outputs&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Additional model types&lt;/strong&gt; (e.g., diffusion models)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Heterogeneous accelerators&lt;/strong&gt; (serving on multiple accelerator types with latency- and cost-aware load balancing)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Disaggregated serving&lt;/strong&gt; for independently scaling pools&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;By aligning model serving with Kubernetes-native tooling, Gateway API Inference Extension aims to simplify
and standardize how AI/ML traffic is routed. With model-aware routing, criticality-based prioritization, and
more, it helps ops teams deliver the right LLM services to the right users—smoothly and efficiently.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ready to learn more?&lt;/strong&gt; Visit the &lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/&#34;&gt;project docs&lt;/a&gt; to dive deeper,
give an Inference Gateway extension a try with a few &lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/guides/&#34;&gt;simple steps&lt;/a&gt;,
and &lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/contributing/&#34;&gt;get involved&lt;/a&gt; if you’re interested in
contributing to the project!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Start Sidecar First: How To Avoid Snags</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/03/start-sidecar-first/</link>
      <pubDate>Tue, 03 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/03/start-sidecar-first/</guid>
      <description>
        
        
        &lt;p&gt;From the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/22/multi-container-pods-overview/&#34;&gt;Kubernetes Multicontainer Pods: An Overview blog post&lt;/a&gt; you know what their job is, what are the main architectural patterns, and how they are implemented in Kubernetes. The main thing I’ll cover in this article is how to ensure that your sidecar containers start before the main app. It’s more complicated than you might think!&lt;/p&gt;
&lt;h2 id=&#34;a-gentle-refresher&#34;&gt;A gentle refresher&lt;/h2&gt;
&lt;p&gt;I&#39;d just like to remind readers that the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2023/12/13/kubernetes-v1-29-release/&#34;&gt;v1.29.0 release of Kubernetes&lt;/a&gt; added native support for
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/sidecar-containers/&#34;&gt;sidecar containers&lt;/a&gt;, which can now be defined within the &lt;code&gt;.spec.initContainers&lt;/code&gt; field,
but with &lt;code&gt;restartPolicy: Always&lt;/code&gt;. You can see that illustrated in the following example Pod manifest snippet:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;logshipper&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;alpine:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# this is what makes it a sidecar container&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;sh&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;-c&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;tail -F /opt/logs.txt&amp;#39;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeMounts&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;data&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mountPath&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/opt&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What are the specifics of defining sidecars with a &lt;code&gt;.spec.initContainers&lt;/code&gt; block, rather than as a legacy multi-container pod with multiple &lt;code&gt;.spec.containers&lt;/code&gt;?
Well, all &lt;code&gt;.spec.initContainers&lt;/code&gt; are always launched &lt;strong&gt;before&lt;/strong&gt; the main application. If you define Kubernetes-native sidecars, those are terminated &lt;strong&gt;after&lt;/strong&gt; the main application. Furthermore, when used with &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/controllers/job/&#34;&gt;Jobs&lt;/a&gt;, a sidecar container should still be alive and could potentially even restart after the owning Job is complete; Kubernetes-native sidecar containers do not block pod completion.&lt;/p&gt;
&lt;p&gt;To learn more, you can also read the official &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tutorials/configuration/pod-sidecar-containers/&#34;&gt;Pod sidecar containers tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;the-problem&#34;&gt;The problem&lt;/h2&gt;
&lt;p&gt;Now you know that defining a sidecar with this native approach will always start it before the main application. From the &lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/537a602195efdc04cdf2cb0368792afad082d9fd/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L827-L830&#34;&gt;kubelet source code&lt;/a&gt;, it&#39;s visible that this often means being started almost in parallel, and this is not always what an engineer wants to achieve. What I&#39;m really interested in is whether I can delay the start of the main application until the sidecar is not just started, but fully running and ready to serve.
It might be a bit tricky because the problem with sidecars is there’s no obvious success signal, contrary to init containers - designed to run only for a specified period of time. With an init container, exit status 0 is unambiguously &amp;quot;I succeeded&amp;quot;. With a sidecar, there are lots of points at which you can say &amp;quot;a thing is running&amp;quot;.
Starting one container only after the previous one is ready is part of a graceful deployment strategy, ensuring proper sequencing and stability during startup. It’s also actually how I’d expect sidecar containers to work as well, to cover the scenario where the main application is dependent on the sidecar. For example, it may happen that an app errors out if the sidecar isn’t available to serve requests (e.g., logging with DataDog). Sure, one could change the application code (and it would actually be the “best practice” solution), but sometimes they can’t - and this post focuses on this use case.&lt;/p&gt;
&lt;p&gt;I&#39;ll explain some ways that you might try, and show you what approaches will really work.&lt;/p&gt;
&lt;h2 id=&#34;readiness-probe&#34;&gt;Readiness probe&lt;/h2&gt;
&lt;p&gt;To check whether Kubernetes native sidecar delays the start of the main application until the sidecar is ready, let’s simulate a short investigation. Firstly, I’ll simulate a sidecar container which will never be ready by implementing a readiness probe which will never succeed. As a reminder, a &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/configuration/liveness-readiness-startup-probes/&#34;&gt;readiness probe&lt;/a&gt; checks if the container is ready to start accepting traffic and therefore, if the pod can be used as a backend for services.&lt;/p&gt;
&lt;p&gt;(Unlike standard init containers, sidecar containers can have &lt;a href=&#34;https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/&#34;&gt;probes&lt;/a&gt; so that the kubelet can supervise the sidecar and intervene if there are problems. For example, restarting a sidecar container if it fails a health check.)&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;apps/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Deployment&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;replicas&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;selector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;template&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;alpine:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sh&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;-c&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sleep 3600&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;TCP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;readinessProbe&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;exec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- /bin/sh&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- exit 1&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# this command always fails, keeping the container &amp;#34;Not Ready&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;periodSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;data&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;emptyDir&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The result is:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;controlplane $ kubectl get pods -w
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;NAME                    READY   STATUS    RESTARTS   AGE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;myapp-db5474f45-htgw5   1/2     Running   0          9m28s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt;&lt;span style=&#34;&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#888&#34;&gt;controlplane $ kubectl describe pod myapp-db5474f45-htgw5 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;Name:             myapp-db5474f45-htgw5
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;Namespace:        default
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;(...)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;Events:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Type     Reason     Age               From               Message
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  ----     ------     ----              ----               -------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Scheduled  17s               default-scheduler  Successfully assigned default/myapp-db5474f45-htgw5 to node01
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Pulling    16s               kubelet            Pulling image &amp;#34;nginx:latest&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Pulled     16s               kubelet            Successfully pulled image &amp;#34;nginx:latest&amp;#34; in 163ms (163ms including waiting). Image size: 72080558 bytes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Created    16s               kubelet            Created container nginx
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Started    16s               kubelet            Started container nginx
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Pulling    15s               kubelet            Pulling image &amp;#34;alpine:latest&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Pulled     15s               kubelet            Successfully pulled image &amp;#34;alpine:latest&amp;#34; in 159ms (160ms including waiting). Image size: 3652536 bytes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Created    15s               kubelet            Created container myapp
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Started    15s               kubelet            Started container myapp
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Warning  Unhealthy  1s (x6 over 15s)  kubelet            Readiness probe failed:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;From these logs it’s evident that only one container is ready - and I know it can’t be the sidecar, because I’ve defined it so it’ll never be ready (you can also check container statuses in &lt;code&gt;kubectl get pod -o json&lt;/code&gt;). I also saw that myapp has been started before the sidecar is ready. That was not the result I wanted to achieve; in this case, the main app container has a hard dependency on its sidecar.&lt;/p&gt;
&lt;h2 id=&#34;maybe-a-startup-probe&#34;&gt;Maybe a startup probe?&lt;/h2&gt;
&lt;p&gt;To ensure that the sidecar is ready before the main app container starts, I can define a &lt;code&gt;startupProbe&lt;/code&gt;. It will delay the start of the main container until the command is successfully executed (returns &lt;code&gt;0&lt;/code&gt; exit status). If you’re wondering why I’ve added it to my &lt;code&gt;initContainer&lt;/code&gt;, let’s analyse what happens If I’d added it to myapp container. I wouldn’t have guaranteed the probe would run before the main application code - and this one, can potentially error out without the sidecar being up and running.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;apps/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Deployment&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;replicas&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;selector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;template&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;alpine:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sh&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;-c&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sleep 3600&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;TCP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;startupProbe&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;httpGet&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;path&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initialDelaySeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;periodSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;30&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;failureThreshold&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;timeoutSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;20&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;data&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;emptyDir&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This results in 2/2 containers being ready and running, and from events, it can be inferred that the main application started only after nginx had already been started. But to confirm whether it waited for the sidecar readiness, let’s change the &lt;code&gt;startupProbe&lt;/code&gt; to the exec type of command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;startupProbe&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;exec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- /bin/sh&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- sleep 15&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and run &lt;code&gt;kubectl get pods -w&lt;/code&gt; to watch in real time whether the readiness of both containers only changes after a 15 second delay. Again, events confirm the main application starts after the sidecar.
That means that using the &lt;code&gt;startupProbe&lt;/code&gt; with a correct &lt;code&gt;startupProbe.httpGet&lt;/code&gt; request helps to delay the main application start until the sidecar is ready. It’s not optimal, but it works.&lt;/p&gt;
&lt;h2 id=&#34;what-about-the-poststart-lifecycle-hook&#34;&gt;What about the postStart lifecycle hook?&lt;/h2&gt;
&lt;p&gt;Fun fact: using the &lt;code&gt;postStart&lt;/code&gt; lifecycle hook block will also do the job, but I’d have to write my own mini-shell script, which is even less efficient.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;TCP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;lifecycle&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;postStart&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;exec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;- /bin/sh&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;- |&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            echo &amp;#34;Waiting for readiness at http://localhost:80&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            until curl -sf http://localhost:80; do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;              echo &amp;#34;Still waiting for http://localhost:80...&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;              sleep 5
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            echo &amp;#34;Service is ready at http://localhost:80&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;liveness-probe&#34;&gt;Liveness probe&lt;/h2&gt;
&lt;p&gt;An interesting exercise would be to check the sidecar container behavior with a &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/configuration/liveness-readiness-startup-probes/&#34;&gt;liveness probe&lt;/a&gt;.
A liveness probe behaves and is configured similarly to a readiness probe - only with the difference that it doesn’t affect the readiness of the container but restarts it in case the probe fails.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;livenessProbe&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;exec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- /bin/sh&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- exit 1&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# this command always fails, keeping the container &amp;#34;Not Ready&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;periodSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After adding the liveness probe configured just as the previous readiness probe and checking events of the pod by &lt;code&gt;kubectl describe pod&lt;/code&gt; it’s visible that the sidecar has a restart count above 0. Nevertheless, the main application is not restarted nor influenced at all, even though I&#39;m aware that (in our imaginary worst-case scenario) it can error out when the sidecar is not there serving requests.
What if I’d used a &lt;code&gt;livenessProbe&lt;/code&gt; without lifecycle &lt;code&gt;postStart&lt;/code&gt;? Both containers will be immediately ready: at the beginning, this behavior will not be different from the one without any additional probes since the liveness probe doesn’t affect readiness at all. After a while, the sidecar will begin to restart itself, but it won’t influence the main container.&lt;/p&gt;
&lt;h2 id=&#34;findings-summary&#34;&gt;Findings summary&lt;/h2&gt;
&lt;p&gt;I’ll summarize the startup behavior in the table below:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Probe/Hook&lt;/th&gt;
&lt;th&gt;Sidecar starts before the main app?&lt;/th&gt;
&lt;th&gt;Main app waits for the sidecar to be ready?&lt;/th&gt;
&lt;th&gt;What if the check doesn’t pass?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;readinessProbe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;, but it’s almost in parallel (effectively &lt;strong&gt;no&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sidecar is not ready; main app continues running&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;livenessProbe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Yes, but it’s almost in parallel (effectively &lt;strong&gt;no&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sidecar is restarted, main app continues running&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;startupProbe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Main app is not started&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;postStart&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;, main app container starts after &lt;code&gt;postStart&lt;/code&gt; completes&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;, but you have to provide custom logic for that&lt;/td&gt;
&lt;td&gt;Main app is not started&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;To summarize: with sidecars often being a dependency of the main application, you may want to delay the start of the latter until the sidecar is healthy.
The ideal pattern is to start both containers simultaneously and have the app container logic delay at all levels, but it’s not always possible. If that&#39;s what you need, you have to use the right kind of customization to the Pod definition. Thankfully, it’s nice and quick, and you have the recipe ready above.&lt;/p&gt;
&lt;p&gt;Happy deploying!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Gateway API v1.3.0: Advancements in Request Mirroring, CORS, Gateway Merging, and Retry Budgets</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/02/gateway-api-v1-3/</link>
      <pubDate>Mon, 02 Jun 2025 09:00:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/02/gateway-api-v1-3/</guid>
      <description>
        
        
        &lt;p&gt;&lt;img alt=&#34;Gateway API logo&#34; src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/06/02/gateway-api-v1-3/gateway-api-logo.svg&#34;&gt;&lt;/p&gt;
&lt;p&gt;Join us in the Kubernetes SIG Network community in celebrating the general
availability of &lt;a href=&#34;https://gateway-api.sigs.k8s.io/&#34;&gt;Gateway API&lt;/a&gt; v1.3.0! We are
also pleased to announce that there are already a number of conformant
implementations to try, made possible by postponing this blog
announcement. Version 1.3.0 of the API was released about a month ago on
April 24, 2025.&lt;/p&gt;
&lt;p&gt;Gateway API v1.3.0 brings a new feature to the &lt;em&gt;Standard&lt;/em&gt; channel
(Gateway API&#39;s GA release channel): &lt;em&gt;percentage-based request mirroring&lt;/em&gt;, and
introduces three new experimental features: cross-origin resource sharing (CORS)
filters, a standardized mechanism for listener and gateway merging, and retry
budgets.&lt;/p&gt;
&lt;p&gt;Also see the full
&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/54df0a899c1c5c845dd3a80f05dcfdf65576f03c/CHANGELOG/1.3-CHANGELOG.md&#34;&gt;release notes&lt;/a&gt;
and applaud the
&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/54df0a899c1c5c845dd3a80f05dcfdf65576f03c/CHANGELOG/1.3-TEAM.md&#34;&gt;v1.3.0 release team&lt;/a&gt;
next time you see them.&lt;/p&gt;
&lt;h2 id=&#34;graduation-to-standard-channel&#34;&gt;Graduation to Standard channel&lt;/h2&gt;
&lt;p&gt;Graduation to the Standard channel is a notable achievement for Gateway API
features, as inclusion in the Standard release channel denotes a high level of
confidence in the API surface and provides guarantees of backward compatibility.
Of course, as with any other Kubernetes API, Standard channel features can continue
to evolve with backward-compatible additions over time, and we (SIG Network)
certainly expect
further refinements and improvements in the future. For more information on how
all of this works, refer to the &lt;a href=&#34;https://gateway-api.sigs.k8s.io/concepts/versioning/&#34;&gt;Gateway API Versioning Policy&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;percentage-based-request-mirroring&#34;&gt;Percentage-based request mirroring&lt;/h3&gt;
&lt;p&gt;Leads: &lt;a href=&#34;https://github.com/LiorLieberman&#34;&gt;Lior Lieberman&lt;/a&gt;,&lt;a href=&#34;https://github.com/jakebennert&#34;&gt;Jake Bennert&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;GEP-3171: &lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/geps/gep-3171/index.md&#34;&gt;Percentage-Based Request Mirroring&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Percentage-based request mirroring&lt;/em&gt; is an enhancement to the
existing support for &lt;a href=&#34;https://gateway-api.sigs.k8s.io/guides/http-request-mirroring/&#34;&gt;HTTP request mirroring&lt;/a&gt;, which allows HTTP requests to be duplicated to another backend using the
RequestMirror filter type.  Request mirroring is particularly useful in
blue-green deployment. It can be used to assess the impact of request scaling on
application performance without impacting responses to clients.&lt;/p&gt;
&lt;p&gt;The previous mirroring capability worked on all the requests to a &lt;code&gt;backendRef&lt;/code&gt;.&lt;br&gt;
Percentage-based request mirroring allows users to specify a subset of requests
they want to be mirrored, either by percentage or fraction. This can be
particularly useful when services are receiving a large volume of requests.
Instead of mirroring all of those requests, this new feature can be used to
mirror a smaller subset of them.&lt;/p&gt;
&lt;p&gt;Here&#39;s an example with 42% of the requests to &amp;quot;foo-v1&amp;quot; being mirrored to &amp;quot;foo-v2&amp;quot;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPRoute&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;http-filter-mirror&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;gateway&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;mirror-gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;mirror-gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostnames&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- mirror.example&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo-v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;filters&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;RequestMirror&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requestMirror&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo-v2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;percent&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;42&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# This value must be an integer.&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can also configure the partial mirroring using a fraction. Here is an example
with 5 out of every 1000 requests to &amp;quot;foo-v1&amp;quot; being mirrored to &amp;quot;foo-v2&amp;quot;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo-v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;filters&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;RequestMirror&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requestMirror&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo-v2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;fraction&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;numerator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;denominator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;additions-to-experimental-channel&#34;&gt;Additions to Experimental channel&lt;/h2&gt;
&lt;p&gt;The Experimental channel is Gateway API&#39;s channel for experimenting with new
features and gaining confidence with them before allowing them to graduate to
standard.  Please note: the experimental channel may include features that are
changed or removed later.&lt;/p&gt;
&lt;p&gt;Starting in release v1.3.0, in an effort to distinguish Experimental channel
resources from Standard channel resources, any new experimental API kinds have the
prefix &amp;quot;&lt;strong&gt;X&lt;/strong&gt;&amp;quot;.  For the same reason, experimental resources are now added to the
API group &lt;code&gt;gateway.networking.x-k8s.io&lt;/code&gt; instead of &lt;code&gt;gateway.networking.k8s.io&lt;/code&gt;.
Bear in mind that using new experimental channel resources means they can coexist
with standard channel resources, but migrating these resources to the standard
channel will require recreating them with the standard channel names and API
group (both of which lack the &amp;quot;x-k8s&amp;quot; designator or &amp;quot;X&amp;quot; prefix).&lt;/p&gt;
&lt;p&gt;The v1.3 release introduces two new experimental API kinds: XBackendTrafficPolicy
and XListenerSet.  To be able to use experimental API kinds, you need to install
the Experimental channel Gateway API YAMLs from the locations listed below.&lt;/p&gt;
&lt;h3 id=&#34;cors-filtering&#34;&gt;CORS filtering&lt;/h3&gt;
&lt;p&gt;Leads: &lt;a href=&#34;https://github.com/liangli&#34;&gt;Liang Li&lt;/a&gt;, &lt;a href=&#34;https://github.com/EyalPazz&#34;&gt;Eyal Pazz&lt;/a&gt;, &lt;a href=&#34;https://github.com/robscott&#34;&gt;Rob Scott&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;GEP-1767: &lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/geps/gep-1767/index.md&#34;&gt;CORS Filter&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cross-origin resource sharing (CORS) is an HTTP-header based mechanism that allows
a web page to access restricted resources from a server on an origin (domain,
scheme, or port) different from the domain that served the web page. This feature
adds a new HTTPRoute &lt;code&gt;filter&lt;/code&gt; type, called &amp;quot;CORS&amp;quot;, to configure the handling of
cross-origin requests before the response is sent back to the client.&lt;/p&gt;
&lt;p&gt;To be able to use experimental CORS filtering, you need to install the
&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/config/crd/experimental/gateway.networking.k8s.io_httproutes.yaml&#34;&gt;Experimental channel Gateway API HTTPRoute yaml&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here&#39;s an example of a simple cross-origin configuration:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPRoute&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;http-route-cors&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;http-gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matches&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;path&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PathPrefix&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/resource/foo&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;filters&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cors&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;CORS&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowOrigins&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- *&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowMethods&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- GET&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- HEAD&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- POST&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowHeaders&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Accept&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Accept-Language&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Content-Language&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Content-Type&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Range&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Service&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;http-route-cors&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this case, the Gateway returns an &lt;em&gt;origin header&lt;/em&gt; of &amp;quot;*&amp;quot;, which means that the
requested resource can be referenced from any origin, a &lt;em&gt;methods header&lt;/em&gt;
(&lt;code&gt;Access-Control-Allow-Methods&lt;/code&gt;) that permits the &lt;code&gt;GET&lt;/code&gt;, &lt;code&gt;HEAD&lt;/code&gt;, and &lt;code&gt;POST&lt;/code&gt;
verbs, and a &lt;em&gt;headers header&lt;/em&gt; allowing &lt;code&gt;Accept&lt;/code&gt;, &lt;code&gt;Accept-Language&lt;/code&gt;,
&lt;code&gt;Content-Language&lt;/code&gt;, &lt;code&gt;Content-Type&lt;/code&gt;, and &lt;code&gt;Range&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;HTTP/1.1 200 OK
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Access-Control-Allow-Origin: *
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Access-Control-Allow-Methods: GET, HEAD, POST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Access-Control-Allow-Headers: Accept,Accept-Language,Content-Language,Content-Type,Range
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The complete list of fields in the new CORS filter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;allowOrigins&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;allowMethods&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;allowHeaders&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;allowCredentials&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;exposeHeaders&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;maxAge&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See &lt;a href=&#34;https://fetch.spec.whatwg.org/#http-cors-protocol&#34;&gt;CORS protocol&lt;/a&gt; for details.&lt;/p&gt;
&lt;h3 id=&#34;XListenerSet&#34;&gt;XListenerSets (standardized mechanism for Listener and Gateway merging)&lt;/h3&gt;
&lt;p&gt;Lead: &lt;a href=&#34;https://github.com/dprotaso&#34;&gt;Dave Protasowski&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;GEP-1713: &lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/pull/3213&#34;&gt;ListenerSets - Standard Mechanism to Merge Multiple Gateways&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This release adds a new experimental API kind, XListenerSet, that allows a
shared list of &lt;em&gt;listeners&lt;/em&gt; to be attached to one or more parent Gateway(s).  In
addition, it expands upon the existing suggestion that Gateway API implementations
may merge configuration from multiple Gateway objects.  It also:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;adds a new field &lt;code&gt;allowedListeners&lt;/code&gt; to the &lt;code&gt;.spec&lt;/code&gt; of a Gateway. The
&lt;code&gt;allowedListeners&lt;/code&gt; field defines from which Namespaces to select XListenerSets
that are allowed to attach to that Gateway: Same, All, None, or Selector based.&lt;/li&gt;
&lt;li&gt;increases the previous maximum number (64) of listeners with the addition of
XListenerSets.&lt;/li&gt;
&lt;li&gt;allows the delegation of listener configuration, such as TLS, to applications in
other namespaces.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To be able to use experimental XListenerSet, you need to install the
&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/config/crd/experimental/gateway.networking.x-k8s.io_xlistenersets.yaml&#34;&gt;Experimental channel Gateway API XListenerSet yaml&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The following example shows a Gateway with an HTTP listener and two child HTTPS
XListenerSets with unique hostnames and certificates.  The combined set of listeners
attached to the Gateway includes the two additional HTTPS listeners in the
XListenerSets that attach to the Gateway.  This example illustrates the
delegation of listener TLS config to application owners in different namespaces
(&amp;quot;store&amp;quot; and &amp;quot;app&amp;quot;).  The HTTPRoute has both the Gateway listener named &amp;quot;foo&amp;quot; and
one XListenerSet listener named &amp;quot;second&amp;quot; as &lt;code&gt;parentRefs&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;prod-external&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;infra&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;gatewayClassName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowedListeners&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;from&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;All&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;listeners&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostname&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.x-k8s.io/v1alpha1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;XListenerSet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;store&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;store&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;prod-external&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;listeners&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;first&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostname&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;first.foo.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPS&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;443&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tls&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Terminate&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;certificateRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Secret&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;group&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;first-workload-cert&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.x-k8s.io/v1alpha1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;XListenerSet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;prod-external&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;listeners&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;second&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostname&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;second.foo.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPS&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;443&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tls&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Terminate&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;certificateRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Secret&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;group&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;second-workload-cert&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPRoute&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;httproute-example&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;XListenerSet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;sectionName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;second&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;parent-gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;sectionName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Each listener in a Gateway must have a unique combination of &lt;code&gt;port&lt;/code&gt;, &lt;code&gt;protocol&lt;/code&gt;,
(and &lt;code&gt;hostname&lt;/code&gt; if supported by the protocol) in order for all listeners to be
&lt;strong&gt;compatible&lt;/strong&gt; and not conflicted over which traffic they should receive.&lt;/p&gt;
&lt;p&gt;Furthermore, implementations can &lt;em&gt;merge&lt;/em&gt; separate Gateways into a single set of
listener addresses if all listeners across those Gateways are compatible.  The
management of merged listeners was under-specified in releases prior to v1.3.0.&lt;/p&gt;
&lt;p&gt;With the new feature, the specification on merging is expanded.  Implementations
must treat the parent Gateways as having the merged list of all listeners from
itself and from attached XListenerSets, and validation of this list of listeners
must behave the same as if the list were part of a single Gateway. Within a single
Gateway, listeners are ordered using the following precedence:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Single Listeners (not a part of an XListenerSet) first,&lt;/li&gt;
&lt;li&gt;Remaining listeners ordered by:
&lt;ul&gt;
&lt;li&gt;object creation time (oldest first), and if two listeners are defined in
objects that have the same timestamp, then&lt;/li&gt;
&lt;li&gt;alphabetically based on &amp;quot;{namespace}/{name of listener}&amp;quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;XBackendTrafficPolicy&#34;&gt;Retry budgets (XBackendTrafficPolicy)&lt;/h3&gt;
&lt;p&gt;Leads: &lt;a href=&#34;https://github.com/ericdbishop&#34;&gt;Eric Bishop&lt;/a&gt;, &lt;a href=&#34;https://github.com/mikemorris&#34;&gt;Mike Morris&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;GEP-3388: &lt;a href=&#34;https://gateway-api.sigs.k8s.io/geps/gep-3388&#34;&gt;Retry Budgets&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This feature allows you to configure a &lt;em&gt;retry budget&lt;/em&gt; across all endpoints
of a destination Service.  This is used to limit additional client-side retries
after reaching a configured threshold. When configuring the budget, the maximum
percentage of active requests that may consist of retries may be specified, as well as
the interval over which requests will be considered when calculating the threshold
for retries. The development of this specification changed the existing
experimental API kind BackendLBPolicy into a new experimental API kind,
XBackendTrafficPolicy, in the interest of reducing the proliferation of policy
resources that had commonalities.&lt;/p&gt;
&lt;p&gt;To be able to use experimental retry budgets, you need to install the
&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/config/crd/experimental/gateway.networking.x-k8s.io_xbackendtrafficpolicies.yaml&#34;&gt;Experimental channel Gateway API XBackendTrafficPolicy yaml&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The following example shows an XBackendTrafficPolicy that applies a
&lt;code&gt;retryConstraint&lt;/code&gt; that represents a budget that limits the retries to a maximum
of 20% of requests, over a duration of 10 seconds, and to a minimum of 3 retries
over 1 second.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.x-k8s.io/v1alpha1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;XBackendTrafficPolicy&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;traffic-policy-example&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;retryConstraint&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;budget&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;percent&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;20&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;interval&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;10s&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;minRetryRate&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;count&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;3&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;interval&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;1s&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;try-it-out&#34;&gt;Try it out&lt;/h2&gt;
&lt;p&gt;Unlike other Kubernetes APIs, you don&#39;t need to upgrade to the latest version of
Kubernetes to get the latest version of Gateway API. As long as you&#39;re running
Kubernetes 1.26 or later, you&#39;ll be able to get up and running with this version
of Gateway API.&lt;/p&gt;
&lt;p&gt;To try out the API, follow the &lt;a href=&#34;https://gateway-api.sigs.k8s.io/guides/&#34;&gt;Getting Started Guide&lt;/a&gt;.
As of this writing, four implementations are already conformant with Gateway API
v1.3 experimental channel features. In alphabetical order:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/airlock/microgateway/releases/tag/4.6.0&#34;&gt;Airlock Microgateway 4.6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/cilium/cilium&#34;&gt;Cilium main&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/envoyproxy/gateway/releases/tag/v1.4.0&#34;&gt;Envoy Gateway v1.4.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://istio.io&#34;&gt;Istio 1.27-dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;Get involved&lt;/h2&gt;
&lt;p&gt;Wondering when a feature will be added?  There are lots of opportunities to get
involved and help define the future of Kubernetes routing APIs for both ingress
and service mesh.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check out the &lt;a href=&#34;https://gateway-api.sigs.k8s.io/guides&#34;&gt;user guides&lt;/a&gt; to see what use-cases can be addressed.&lt;/li&gt;
&lt;li&gt;Try out one of the &lt;a href=&#34;https://gateway-api.sigs.k8s.io/implementations/&#34;&gt;existing Gateway controllers&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Or &lt;a href=&#34;https://gateway-api.sigs.k8s.io/contributing/&#34;&gt;join us in the community&lt;/a&gt;
and help us build the future of Gateway API together!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The maintainers would like to thank &lt;em&gt;everyone&lt;/em&gt; who&#39;s contributed to Gateway
API, whether in the form of commits to the repo, discussion, ideas, or general
support. We could never have made this kind of progress without the support of
this dedicated and active community.&lt;/p&gt;
&lt;h2 id=&#34;related-kubernetes-blog-articles&#34;&gt;Related Kubernetes blog articles&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/11/21/gateway-api-v1-2/&#34;&gt;Gateway API v1.2: WebSockets, Timeouts, Retries, and More&lt;/a&gt;
(November 2024)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/05/09/gateway-api-v1-1/&#34;&gt;Gateway API v1.1: Service mesh, GRPCRoute, and a whole lot more&lt;/a&gt;
(May 2024)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2023/11/28/gateway-api-ga/&#34;&gt;New Experimental Features in Gateway API v1.0&lt;/a&gt;
(November 2023)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2023/10/31/gateway-api-ga/&#34;&gt;Gateway API v1.0: GA Release&lt;/a&gt;
(October 2023)&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Spotlight on Policy Working Group</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/22/wg-policy-spotlight-2025/</link>
      <pubDate>Thu, 22 May 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/22/wg-policy-spotlight-2025/</guid>
      <description>
        
        
        &lt;p&gt;&lt;em&gt;(Note: The Policy Working Group has completed its mission and is no longer active. This article reflects its work, accomplishments, and insights into how a working group operates.)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In the complex world of Kubernetes, policies play a crucial role in managing and securing clusters. But have you ever wondered how these policies are developed, implemented, and standardized across the Kubernetes ecosystem? To answer that, let&#39;s take a look back at the work of the Policy Working Group.&lt;/p&gt;
&lt;p&gt;The Policy Working Group was dedicated to a critical mission: providing an overall architecture that encompasses both current policy-related implementations and future policy proposals in Kubernetes. Their goal was both ambitious and essential: to develop a universal policy architecture that benefits developers and end-users alike.&lt;/p&gt;
&lt;p&gt;Through collaborative methods, this working group strove to bring clarity and consistency to the often complex world of Kubernetes policies. By focusing on both existing implementations and future proposals, they ensured that the policy landscape in Kubernetes remains coherent and accessible as the technology evolves.&lt;/p&gt;
&lt;p&gt;This blog post dives deeper into the work of the Policy Working Group, guided by insights from its former co-chairs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://twitter.com/JimBugwadia&#34;&gt;Jim Bugwadia&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://twitter.com/poonam-lamba&#34;&gt;Poonam Lamba&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://twitter.com/sudermanjr&#34;&gt;Andy Suderman&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Interviewed by &lt;a href=&#34;https://twitter.com/arujjval&#34;&gt;Arujjwal Negi&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;These co-chairs explained what the Policy Working Group was all about.&lt;/p&gt;
&lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Hello, thank you for the time! Let’s start with some introductions, could you tell us a bit about yourself, your role, and how you got involved in Kubernetes?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Jim Bugwadia&lt;/strong&gt;: My name is Jim Bugwadia, and I am a co-founder and the CEO at Nirmata which provides solutions that automate security and compliance for cloud-native workloads. At Nirmata, we have been working with Kubernetes since it started in 2014. We initially built a Kubernetes policy engine in our commercial platform and later donated it to CNCF as the Kyverno project. I joined the CNCF Kubernetes Policy Working Group to help build and standardize various aspects of policy management for Kubernetes and later became a co-chair.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Andy Suderman&lt;/strong&gt;: My name is Andy Suderman and I am the CTO of Fairwinds, a managed Kubernetes-as-a-Service provider. I began working with Kubernetes in 2016 building a web conferencing platform. I am an author and/or maintainer of several Kubernetes-related open-source projects such as Goldilocks, Pluto, and Polaris. Polaris is a JSON-schema-based policy engine, which started Fairwinds&#39; journey into the policy space and my involvement in the Policy Working Group.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Poonam Lamba&lt;/strong&gt;: My name is Poonam Lamba, and I currently work as a Product Manager for Google Kubernetes Engine (GKE) at Google. My journey with Kubernetes began back in 2017 when I was building an SRE platform for a large enterprise, using a private cloud built on Kubernetes. Intrigued by its potential to revolutionize the way we deployed and managed applications at the time, I dove headfirst into learning everything I could about it. Since then, I&#39;ve had the opportunity to build the policy and compliance products for GKE. I lead and contribute to GKE CIS benchmarks. I am involved with the Gatekeeper project as well as I have contributed to Policy-WG for over 2 years and served as a co-chair for the group.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Responses to the following questions represent an amalgamation of insights from the former co-chairs.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;about-working-groups&#34;&gt;About Working Groups&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;One thing even I am not aware of is the difference between a working group and a SIG. Can you help us understand what a working group is and how it is different from a SIG?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Unlike SIGs, working groups are temporary and focused on tackling specific, cross-cutting issues or projects that may involve multiple SIGs. Their lifespan is defined, and they disband once they&#39;ve achieved their objective. Generally, working groups don&#39;t own code or have long-term responsibility for managing a particular area of the Kubernetes project.&lt;/p&gt;
&lt;p&gt;(To know more about SIGs, visit the &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-list.md&#34;&gt;list of Special Interest Groups&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You mentioned that Working Groups involve multiple SIGS. What SIGS was the Policy WG closely involved with, and how did you coordinate with them?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The group collaborated closely with Kubernetes SIG Auth throughout our existence, and more recently, the group also worked with SIG Security since its formation. Our collaboration occurred in a few ways. We provided periodic updates during the SIG meetings to keep them informed of our progress and activities. Additionally, we utilize other community forums to maintain open lines of communication and ensured our work aligned with the broader Kubernetes ecosystem. This collaborative approach helped the group stay coordinated with related efforts across the Kubernetes community.&lt;/p&gt;
&lt;h2 id=&#34;policy-wg&#34;&gt;Policy WG&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Why was the Policy Working Group created?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To enable a broad set of use cases, we recognize that Kubernetes is powered by a highly declarative, fine-grained, and extensible configuration management system. We&#39;ve observed that a Kubernetes configuration manifest may have different portions that are important to various stakeholders. For example, some parts may be crucial for developers, while others might be of particular interest to security teams or address operational concerns. Given this complexity, we believe that policies governing the usage of these intricate configurations are essential for success with Kubernetes.&lt;/p&gt;
&lt;p&gt;Our Policy Working Group was created specifically to research the standardization of policy definitions and related artifacts. We saw a need to bring consistency and clarity to how policies are defined and implemented across the Kubernetes ecosystem, given the diverse requirements and stakeholders involved in Kubernetes deployments.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Can you give me an idea of the work you did in the group?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We worked on several Kubernetes policy-related projects. Our initiatives included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We worked on a Kubernetes Enhancement Proposal (KEP) for the Kubernetes Policy Reports API. This aims to standardize how policy reports are generated and consumed within the Kubernetes ecosystem.&lt;/li&gt;
&lt;li&gt;We conducted a CNCF survey to better understand policy usage in the Kubernetes space. This helped gauge the practices and needs across the community at the time.&lt;/li&gt;
&lt;li&gt;We wrote a paper that will guide users in achieving PCI-DSS compliance for containers. This is intended to help organizations meet important security standards in their Kubernetes environments.&lt;/li&gt;
&lt;li&gt;We also worked on a paper highlighting how shifting security down can benefit organizations. This focuses on the advantages of implementing security measures earlier in the development and deployment process.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Can you tell us what were the main objectives of the Policy Working Group and some of your key accomplishments?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The charter of the Policy WG was to help standardize policy management for Kubernetes and educate the community on best practices.&lt;/p&gt;
&lt;p&gt;To accomplish this we updated the Kubernetes documentation (&lt;a href=&#34;https://kubernetes.io/docs/concepts/policy&#34;&gt;Policies | Kubernetes&lt;/a&gt;), produced several whitepapers (&lt;a href=&#34;https://github.com/kubernetes/sig-security/blob/main/sig-security-docs/papers/policy/CNCF_Kubernetes_Policy_Management_WhitePaper_v1.pdf&#34;&gt;Kubernetes Policy Management&lt;/a&gt;, &lt;a href=&#34;https://github.com/kubernetes/sig-security/blob/main/sig-security-docs/papers/policy_grc/Kubernetes_Policy_WG_Paper_v1_101123.pdf&#34;&gt;Kubernetes GRC&lt;/a&gt;), and created the Policy Reports API (&lt;a href=&#34;https://htmlpreview.github.io/?https://github.com/kubernetes-sigs/wg-policy-prototypes/blob/master/policy-report/docs/index.html&#34;&gt;API reference&lt;/a&gt;) which standardizes reporting across various tools. Several popular tools such as Falco, Trivy, Kyverno, kube-bench, and others support the Policy Report API. A major milestone for the Policy WG was promoting the Policy Reports API to a SIG-level API or finding it a stable home.&lt;/p&gt;
&lt;p&gt;Beyond that, as &lt;a href=&#34;https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/&#34;&gt;ValidatingAdmissionPolicy&lt;/a&gt; and &lt;a href=&#34;https://kubernetes.io/docs/reference/access-authn-authz/mutating-admission-policy/&#34;&gt;MutatingAdmissionPolicy&lt;/a&gt; approached GA in Kubernetes, a key goal of the WG was to guide and educate the community on the tradeoffs and appropriate usage patterns for these built-in API objects and other CNCF policy management solutions like OPA/Gatekeeper and Kyverno.&lt;/p&gt;
&lt;h2 id=&#34;challenges&#34;&gt;Challenges&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;What were some of the major challenges that the Policy Working Group worked on?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;During our work in the Policy Working Group, we encountered several challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;One of the main issues we faced was finding time to consistently contribute. Given that many of us have other professional commitments, it can be difficult to dedicate regular time to the working group&#39;s initiatives.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Another challenge we experienced was related to our consensus-driven model. While this approach ensures that all voices are heard, it can sometimes lead to slower decision-making processes. We valued thorough discussion and agreement, but this can occasionally delay progress on our projects.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We&#39;ve also encountered occasional differences of opinion among group members. These situations require careful navigation to ensure that we maintain a collaborative and productive environment while addressing diverse viewpoints.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Lastly, we&#39;ve noticed that newcomers to the group may find it difficult to contribute effectively without consistent attendance at our meetings. The complex nature of our work often requires ongoing context, which can be challenging for those who aren&#39;t able to participate regularly.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Can you tell me more about those challenges? How did you discover each one? What has the impact been? What were some strategies you used to address them?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There are no easy answers, but having more contributors and maintainers greatly helps! Overall the CNCF community is great to work with and is very welcoming to beginners. So, if folks out there are hesitating to get involved, I highly encourage them to attend a WG or SIG meeting and just listen in.&lt;/p&gt;
&lt;p&gt;It often takes a few meetings to fully understand the discussions, so don&#39;t feel discouraged if you don&#39;t grasp everything right away. We made a point to emphasize this and encouraged new members to review documentation as a starting point for getting involved.&lt;/p&gt;
&lt;p&gt;Additionally, differences of opinion were valued and encouraged within the Policy-WG. We adhered to the CNCF core values and resolve disagreements by maintaining respect for one another. We also strove to timebox our decisions and assign clear responsibilities to keep things moving forward.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;This is where our discussion about the Policy Working Group ends. The working group, and especially the people who took part in this article, hope this gave you some insights into the group&#39;s aims and workings. You can get more info about Working Groups &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: In-Place Pod Resize Graduated to Beta</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/16/kubernetes-v1-33-in-place-pod-resize-beta/</link>
      <pubDate>Fri, 16 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/16/kubernetes-v1-33-in-place-pod-resize-beta/</guid>
      <description>
        
        
        &lt;p&gt;On behalf of the Kubernetes project, I am excited to announce that the &lt;strong&gt;in-place Pod resize&lt;/strong&gt; feature (also known as In-Place Pod Vertical Scaling), first introduced as alpha in Kubernetes v1.27, has graduated to &lt;strong&gt;Beta&lt;/strong&gt; and will be enabled by default in the Kubernetes v1.33 release! This marks a significant milestone in making resource management for Kubernetes workloads more flexible and less disruptive.&lt;/p&gt;
&lt;h2 id=&#34;what-is-in-place-pod-resize&#34;&gt;What is in-place Pod resize?&lt;/h2&gt;
&lt;p&gt;Traditionally, changing the CPU or memory resources allocated to a container required restarting the Pod. While acceptable for many stateless applications, this could be disruptive for stateful services, batch jobs, or any workloads sensitive to restarts.&lt;/p&gt;
&lt;p&gt;In-place Pod resizing allows you to change the CPU and memory requests and limits assigned to containers within a &lt;em&gt;running&lt;/em&gt; Pod, often without requiring a container restart.&lt;/p&gt;
&lt;p&gt;Here&#39;s the core idea:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;spec.containers[*].resources&lt;/code&gt; field in a Pod specification now represents the &lt;em&gt;desired&lt;/em&gt; resources and is mutable for CPU and memory.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;status.containerStatuses[*].resources&lt;/code&gt; field reflects the &lt;em&gt;actual&lt;/em&gt; resources currently configured on a running container.&lt;/li&gt;
&lt;li&gt;You can trigger a resize by updating the desired resources in the Pod spec via the new &lt;code&gt;resize&lt;/code&gt; subresource.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can try it out on a v1.33 Kubernetes cluster by using kubectl to edit a Pod (requires &lt;code&gt;kubectl&lt;/code&gt; v1.32+):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl edit pod &amp;lt;pod-name&amp;gt; --subresource resize
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For detailed usage instructions and examples, please refer to the official Kubernetes documentation:
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/configure-pod-container/resize-container-resources/&#34;&gt;Resize CPU and Memory Resources assigned to Containers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;why-does-in-place-pod-resize-matter&#34;&gt;Why does in-place Pod resize matter?&lt;/h2&gt;
&lt;p&gt;Kubernetes still excels at scaling workloads horizontally (adding or removing replicas), but in-place Pod resizing unlocks several key benefits for vertical scaling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reduced Disruption:&lt;/strong&gt; Stateful applications, long-running batch jobs, and sensitive workloads can have their resources adjusted without suffering the downtime or state loss associated with a Pod restart.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved Resource Utilization:&lt;/strong&gt; Scale down over-provisioned Pods without disruption, freeing up resources in the cluster. Conversely, provide more resources to Pods under heavy load without needing a restart.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faster Scaling:&lt;/strong&gt; Address transient resource needs more quickly. For example Java applications often need more CPU during startup than during steady-state operation. Start with higher CPU and resize down later.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;what-s-changed-between-alpha-and-beta&#34;&gt;What&#39;s changed between Alpha and Beta?&lt;/h2&gt;
&lt;p&gt;Since the alpha release in v1.27, significant work has gone into maturing the feature, improving its stability, and refining the user experience based on feedback and further development. Here are the key changes:&lt;/p&gt;
&lt;h3 id=&#34;notable-user-facing-changes&#34;&gt;Notable user-facing changes&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;resize&lt;/code&gt; Subresource:&lt;/strong&gt; Modifying Pod resources must now be done via the Pod&#39;s &lt;code&gt;resize&lt;/code&gt; subresource (&lt;code&gt;kubectl patch pod &amp;lt;name&amp;gt; --subresource resize ...&lt;/code&gt;). &lt;code&gt;kubectl&lt;/code&gt; versions v1.32+ support this argument.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resize Status via Conditions:&lt;/strong&gt; The old &lt;code&gt;status.resize&lt;/code&gt; field is deprecated. The status of a resize operation is now exposed via two Pod conditions:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;PodResizePending&lt;/code&gt;: Indicates the Kubelet cannot grant the resize immediately (e.g., &lt;code&gt;reason: Deferred&lt;/code&gt; if temporarily unable, &lt;code&gt;reason: Infeasible&lt;/code&gt; if impossible on the node).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PodResizeInProgress&lt;/code&gt;: Indicates the resize is accepted and being applied. Errors encountered during this phase are now reported in this condition&#39;s message with &lt;code&gt;reason: Error&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sidecar Support:&lt;/strong&gt; Resizing &lt;a class=&#39;glossary-tooltip&#39; title=&#39;An auxilliary container that stays running throughout the lifecycle of a Pod.&#39; data-toggle=&#39;tooltip&#39; data-placement=&#39;top&#39; href=&#39;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/sidecar-containers/&#39; target=&#39;_blank&#39; aria-label=&#39;sidecar containers&#39;&gt;sidecar containers&lt;/a&gt; in-place is now supported.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;stability-and-reliability-enhancements&#34;&gt;Stability and reliability enhancements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Refined Allocated Resources Management:&lt;/strong&gt; The allocation management logic with the Kubelet was significantly reworked, making it more consistent and robust. The changes eliminated whole classes of bugs, and greatly improved the reliability of in-place Pod resize.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved Checkpointing &amp;amp; State Tracking:&lt;/strong&gt; A more robust system for tracking &amp;quot;allocated&amp;quot; and &amp;quot;actuated&amp;quot; resources was implemented, using new checkpoint files (&lt;code&gt;allocated_pods_state&lt;/code&gt;, &lt;code&gt;actuated_pods_state&lt;/code&gt;) to reliably manage resize state across Kubelet restarts and handle edge cases where runtime-reported resources differ from requested ones. Several bugs related to checkpointing and state restoration were fixed. Checkpointing efficiency was also improved.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faster Resize Detection:&lt;/strong&gt; Enhancements to the Kubelet&#39;s Pod Lifecycle Event Generator (PLEG) allow the Kubelet to respond to and complete resizes much more quickly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enhanced CRI Integration:&lt;/strong&gt; A new &lt;code&gt;UpdatePodSandboxResources&lt;/code&gt; CRI call was added to better inform runtimes and plugins (like NRI) about Pod-level resource changes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Numerous Bug Fixes:&lt;/strong&gt; Addressed issues related to systemd cgroup drivers, handling of containers without limits, CPU minimum share calculations, container restart backoffs, error propagation, test stability, and more.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;what-s-next&#34;&gt;What&#39;s next?&lt;/h2&gt;
&lt;p&gt;Graduating to Beta means the feature is ready for broader adoption, but development doesn&#39;t stop here! Here&#39;s what the community is focusing on next:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Stability and Productionization:&lt;/strong&gt; Continued focus on hardening the feature, improving performance, and ensuring it is robust for production environments.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Addressing Limitations:&lt;/strong&gt; Working towards relaxing some of the current limitations noted in the documentation, such as allowing memory limit decreases.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/autoscaling/#scaling-workloads-vertically&#34;&gt;VerticalPodAutoscaler&lt;/a&gt; (VPA) Integration:&lt;/strong&gt; Work to enable VPA to leverage in-place Pod resize is already underway. A new &lt;code&gt;InPlaceOrRecreate&lt;/code&gt; update mode will allow it to attempt non-disruptive resizes first, or fall back to recreation if needed. This will allow users to benefit from VPA&#39;s recommendations with significantly less disruption.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User Feedback:&lt;/strong&gt; Gathering feedback from users adopting the beta feature is crucial for prioritizing further enhancements and addressing any uncovered issues or bugs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;getting-started-and-providing-feedback&#34;&gt;Getting started and providing feedback&lt;/h2&gt;
&lt;p&gt;With the &lt;code&gt;InPlacePodVerticalScaling&lt;/code&gt; feature gate enabled by default in v1.33, you can start experimenting with in-place Pod resizing right away!&lt;/p&gt;
&lt;p&gt;Refer to the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/configure-pod-container/resize-container-resources/&#34;&gt;documentation&lt;/a&gt; for detailed guides and examples.&lt;/p&gt;
&lt;p&gt;As this feature moves through Beta, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels (GitHub issues, mailing lists, Slack). You can also review the &lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources&#34;&gt;KEP-1287: In-place Update of Pod Resources&lt;/a&gt; for the full in-depth design details.&lt;/p&gt;
&lt;p&gt;We look forward to seeing how the community leverages in-place Pod resize to build more efficient and resilient applications on Kubernetes!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Announcing etcd v3.6.0</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/15/announcing-etcd-3.6/</link>
      <pubDate>Thu, 15 May 2025 16:00:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/15/announcing-etcd-3.6/</guid>
      <description>
        
        
        &lt;p&gt;&lt;em&gt;This announcement originally &lt;a href=&#34;https://etcd.io/blog/2025/announcing-etcd-3.6/&#34;&gt;appeared&lt;/a&gt; on the etcd blog.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Today, we are releasing &lt;a href=&#34;https://github.com/etcd-io/etcd/releases/tag/v3.6.0&#34;&gt;etcd v3.6.0&lt;/a&gt;, the first minor release since etcd v3.5.0 on June 15, 2021. This release
introduces several new features, makes significant progress on long-standing efforts like downgrade support and
migration to v3store, and addresses numerous critical &amp;amp; major issues. It also includes major optimizations in
memory usage, improving efficiency and performance.&lt;/p&gt;
&lt;p&gt;In addition to the features of v3.6.0, etcd has joined Kubernetes as a SIG (sig-etcd), enabling us to improve
project sustainability. We&#39;ve introduced systematic robustness testing to ensure correctness and reliability.
Through the etcd-operator Working Group, we plan to improve usability as well.&lt;/p&gt;
&lt;p&gt;What follows are the most significant changes introduced in etcd v3.6.0, along with the discussion of the
roadmap for future development. For a detailed list of changes, please refer to the &lt;a href=&#34;https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-3.6.md&#34;&gt;CHANGELOG-3.6&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A heartfelt thank you to all the contributors who made this release possible!&lt;/p&gt;
&lt;h2 id=&#34;security&#34;&gt;Security&lt;/h2&gt;
&lt;p&gt;etcd takes security seriously. To enhance software security in v3.6.0, we have improved our workflow checks by
integrating &lt;code&gt;govulncheck&lt;/code&gt; to scan the source code and &lt;code&gt;trivy&lt;/code&gt; to scan container images. These improvements
have also been backported to supported stable releases.&lt;/p&gt;
&lt;p&gt;etcd continues to follow the &lt;a href=&#34;https://github.com/etcd-io/etcd/blob/main/security/security-release-process.md&#34;&gt;Security Release Process&lt;/a&gt; to ensure vulnerabilities are properly managed and addressed.&lt;/p&gt;
&lt;h2 id=&#34;features&#34;&gt;Features&lt;/h2&gt;
&lt;h3 id=&#34;migration-to-v3store&#34;&gt;Migration to v3store&lt;/h3&gt;
&lt;p&gt;The v2store has been deprecated since etcd v3.4 but could still be enabled via &lt;code&gt;--enable-v2&lt;/code&gt;. It remained the source of
truth for membership data. In etcd v3.6.0, v2store can no longer be enabled as the &lt;code&gt;--enable-v2&lt;/code&gt; flag has been removed,
and v3store has become the sole source of truth for membership data.&lt;/p&gt;
&lt;p&gt;While v2store still exists in v3.6.0, etcd will fail to start if it contains any data other than membership information.
To assist with migration, etcd v3.5.18+ provides the &lt;code&gt;etcdutl check v2store&lt;/code&gt; command, which verifies that v2store
contains only membership data (see &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/19113&#34;&gt;PR 19113&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Compared to v2store, v3store offers better performance and transactional support. It is also the actively maintained
storage engine moving forward.&lt;/p&gt;
&lt;p&gt;The removal of v2store is still ongoing and is tracked in &lt;a href=&#34;https://github.com/etcd-io/etcd/issues/12913&#34;&gt;issues/12913&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;downgrade&#34;&gt;Downgrade&lt;/h3&gt;
&lt;p&gt;etcd v3.6.0 is the first version to fully support downgrade. The effort for this downgrade task spans
both versions 3.5 and 3.6, and all related work is tracked in &lt;a href=&#34;https://github.com/etcd-io/etcd/issues/11716&#34;&gt;issues/11716&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At a high level, the process involves migrating the data schema to the target version (e.g., v3.5),
followed by a rolling downgrade.&lt;/p&gt;
&lt;p&gt;Ensure the cluster is healthy, and take a snapshot backup. Validate whether the downgrade is valid:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ etcdctl downgrade validate 3.5
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Downgrade validate success, cluster version 3.6
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If the downgrade is valid, enable downgrade mode:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ etcdctl downgrade &lt;span style=&#34;color:#a2f&#34;&gt;enable&lt;/span&gt; 3.5
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Downgrade &lt;span style=&#34;color:#a2f&#34;&gt;enable&lt;/span&gt; success, cluster version 3.6
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;etcd will then migrate the data schema in the background. Once complete, proceed with the rolling downgrade.&lt;/p&gt;
&lt;p&gt;For details, refer to the &lt;a href=&#34;https://etcd.io/docs/v3.6/downgrades/downgrade_3_6/&#34;&gt;Downgrade-3.6&lt;/a&gt; guide.&lt;/p&gt;
&lt;h3 id=&#34;feature-gates&#34;&gt;Feature gates&lt;/h3&gt;
&lt;p&gt;In etcd v3.6.0, we introduced Kubernetes-style feature gates for managing new features. Previously, we
indicated unstable features through the &lt;code&gt;--experimental&lt;/code&gt; prefix in feature flag names. The prefix was removed
once the feature was stable, causing a breaking change. Now, features will start in Alpha, progress
to Beta, then GA, or get deprecated. This ensures a much smoother upgrade and downgrade experience for users.&lt;/p&gt;
&lt;p&gt;See &lt;a href=&#34;https://etcd.io/docs/v3.6/feature-gates/&#34;&gt;feature-gates&lt;/a&gt; for details.&lt;/p&gt;
&lt;h3 id=&#34;livezreadyz-checks&#34;&gt;livez / readyz checks&lt;/h3&gt;
&lt;p&gt;etcd now supports &lt;code&gt;/livez&lt;/code&gt; and &lt;code&gt;/readyz&lt;/code&gt; endpoints, aligning with Kubernetes&#39; Liveness and Readiness probes.
&lt;code&gt;/livez&lt;/code&gt; indicates whether the etcd instance is alive, while &lt;code&gt;/readyz&lt;/code&gt; indicates when it is ready to serve requests.
This feature has also been backported to release-3.5 (starting from v3.5.11) and release-3.4 (starting from v3.4.29).
See &lt;a href=&#34;https://etcd.io/docs/v3.6/op-guide/monitoring/&#34;&gt;livez/readyz&lt;/a&gt; for details.&lt;/p&gt;
&lt;p&gt;The existing &lt;code&gt;/health&lt;/code&gt; endpoint remains functional. &lt;code&gt;/livez&lt;/code&gt; is similar to &lt;code&gt;/health?serializable=true&lt;/code&gt;, while
&lt;code&gt;/readyz&lt;/code&gt; is similar to &lt;code&gt;/health&lt;/code&gt; or &lt;code&gt;/health?serializable=false&lt;/code&gt;. Clearly, the &lt;code&gt;/livez&lt;/code&gt; and &lt;code&gt;/readyz&lt;/code&gt;
endpoints provide clearer semantics and are easier to understand.&lt;/p&gt;
&lt;h3 id=&#34;v3discovery&#34;&gt;v3discovery&lt;/h3&gt;
&lt;p&gt;In etcd v3.6.0, the new discovery protocol &lt;a href=&#34;https://etcd.io/docs/v3.6/dev-internal/discovery_protocol/&#34;&gt;v3discovery&lt;/a&gt; was introduced, based on clientv3.
It facilitates the discovery of all cluster members during the bootstrap phase.&lt;/p&gt;
&lt;p&gt;The previous &lt;a href=&#34;https://etcd.io/docs/v3.5/dev-internal/discovery_protocol/&#34;&gt;v2discovery&lt;/a&gt; protocol, based on clientv2, has been deprecated. Additionally,
the public discovery service at &lt;a href=&#34;https://discovery.etcd.io/&#34;&gt;https://discovery.etcd.io/&lt;/a&gt;, which relied on v2discovery, is no longer maintained.&lt;/p&gt;
&lt;h2 id=&#34;performance&#34;&gt;Performance&lt;/h2&gt;
&lt;h3 id=&#34;memory&#34;&gt;Memory&lt;/h3&gt;
&lt;p&gt;In this release, we reduced average memory consumption by at least 50% (see Figure 1). This improvement is primarily due to two changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The default value of &lt;code&gt;--snapshot-count&lt;/code&gt; has been reduced from 100,000 in v3.5 to 10,000 in v3.6. As a result, etcd v3.6 now retains only about 10% of the history records compared to v3.5.&lt;/li&gt;
&lt;li&gt;Raft history is compacted more frequently, as introduced in &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/18825&#34;&gt;PR/18825&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/15/announcing-etcd-3.6/figure-1.png&#34;
         alt=&#34;Diagram of memory usage&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Figure 1:&lt;/strong&gt; Memory usage comparison between etcd v3.5.20 and v3.6.0-rc.2 under different read/write ratios.
Each subplot shows the memory usage over time with a specific read/write ratio. The red line represents etcd
v3.5.20, while the teal line represents v3.6.0-rc.2. Across all tested ratios, v3.6.0-rc.2 exhibits lower and
more stable memory usage.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&#34;throughput&#34;&gt;Throughput&lt;/h3&gt;
&lt;p&gt;Compared to v3.5, etcd v3.6 delivers an average performance improvement of approximately 10%
in both read and write throughput (see Figure 2, 3, 4 and 5). This improvement is not attributed to
any single major change, but rather the cumulative effect of multiple minor enhancements. One such
example is the optimization of the free page queries introduced in &lt;a href=&#34;https://github.com/etcd-io/bbolt/pull/419&#34;&gt;PR/419&lt;/a&gt;.&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/15/announcing-etcd-3.6/figure-2.png&#34;
         alt=&#34;etcd read transaction performance with a high write ratio&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Figure 2:&lt;/strong&gt; Read throughput comparison between etcd v3.5.20 and v3.6.0-rc.2 under a high write ratio. The
read/write ratio is 0.0078, meaning 1 read per 128 writes. The right bar shows the percentage improvement
in read throughput of v3.6.0-rc.2 over v3.5.20, ranging from 3.21% to 25.59%.&lt;/em&gt;&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/15/announcing-etcd-3.6/figure-3.png&#34;
         alt=&#34;etcd read transaction performance with a high read ratio&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Figure 3:&lt;/strong&gt; Read throughput comparison between etcd v3.5.20 and v3.6.0-rc.2 under a high read ratio.
The read/write ratio is 8, meaning 8 reads per write. The right bar shows the percentage improvement in
read throughput of v3.6.0-rc.2 over v3.5.20, ranging from 4.38% to 27.20%.&lt;/em&gt;&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/15/announcing-etcd-3.6/figure-4.png&#34;
         alt=&#34;etcd write transaction performance with a high write ratio&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Figure 4:&lt;/strong&gt; Write throughput comparison between etcd v3.5.20 and v3.6.0-rc.2 under a high write ratio. The
read/write ratio is 0.0078, meaning 1 read per 128 writes. The right bar shows the percentage improvement
in write throughput of v3.6.0-rc.2 over v3.5.20, ranging from 2.95% to 24.24%.&lt;/em&gt;&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/15/announcing-etcd-3.6/figure-5.png&#34;
         alt=&#34;etcd write transaction performance with a high read ratio&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Figure 5:&lt;/strong&gt; Write throughput comparison between etcd v3.5.20 and v3.6.0-rc.2 under a high read ratio.
The read/write ratio is 8, meaning 8 reads per write. The right bar shows the percentage improvement in
write throughput of v3.6.0-rc.2 over v3.5.20, ranging from 3.86% to 28.37%.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;breaking-changes&#34;&gt;Breaking changes&lt;/h2&gt;
&lt;p&gt;This section highlights a few notable breaking changes. For a complete list, please refer to
the &lt;a href=&#34;https://etcd.io/docs/v3.6/upgrades/upgrade_3_6/&#34;&gt;Upgrade etcd from v3.5 to v3.6&lt;/a&gt; and the &lt;a href=&#34;https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-3.6.md&#34;&gt;CHANGELOG-3.6&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Old binaries are incompatible with new schema versions&lt;/p&gt;
&lt;p&gt;Old etcd binaries are not compatible with newer data schema versions. For example, etcd 3.5 cannot start with
data created by etcd 3.6, and etcd 3.4 cannot start with data created by either 3.5 or 3.6.&lt;/p&gt;
&lt;p&gt;When downgrading etcd, it&#39;s important to follow the documented downgrade procedure. Simply replacing
the binary or image will result in the incompatibility issue.&lt;/p&gt;
&lt;h3 id=&#34;peer-endpoints-no-longer-serve-client-requests&#34;&gt;Peer endpoints no longer serve client requests&lt;/h3&gt;
&lt;p&gt;Client endpoints (&lt;code&gt;--advertise-client-urls&lt;/code&gt;) are intended to serve client requests only, while peer
endpoints (&lt;code&gt;--initial-advertise-peer-urls&lt;/code&gt;) are intended solely for peer communication. However, due to an implementation
oversight, the peer endpoints were also able to handle client requests in etcd 3.4 and 3.5. This behavior was misleading and
encouraged incorrect usage patterns. In etcd 3.6, this misleading behavior was corrected via &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/13565&#34;&gt;PR/13565&lt;/a&gt;; peer endpoints
no longer serve client requests.&lt;/p&gt;
&lt;h3 id=&#34;clear-boundary-between-etcdctl-and-etcdutl&#34;&gt;Clear boundary between etcdctl and etcdutl&lt;/h3&gt;
&lt;p&gt;Both &lt;code&gt;etcdctl&lt;/code&gt; and &lt;code&gt;etcdutl&lt;/code&gt; are command line tools. &lt;code&gt;etcdutl&lt;/code&gt; is an offline utility designed to operate directly on
etcd data files, while &lt;code&gt;etcdctl&lt;/code&gt; is an online tool that interacts with etcd over a network. Previously, there were some
overlapping functionalities between the two, but these overlaps were removed in 3.6.0.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Removed &lt;code&gt;etcdctl defrag --data-dir&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;etcdctl defrag&lt;/code&gt; command only support online defragmentation and no longer supports offline defragmentation.
To perform offline defragmentation, use the &lt;code&gt;etcdutl defrag --data-dir&lt;/code&gt; command instead.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Removed &lt;code&gt;etcdctl snapshot status&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;etcdctl&lt;/code&gt; no longer supports retrieving the status of a snapshot. Use the &lt;code&gt;etcdutl snapshot status&lt;/code&gt; command instead.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Removed &lt;code&gt;etcdctl snapshot restore&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;etcdctl&lt;/code&gt; no longer supports restoring from a snapshot. Use the &lt;code&gt;etcdutl snapshot restore&lt;/code&gt; command instead.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;critical-bug-fixes&#34;&gt;Critical bug fixes&lt;/h2&gt;
&lt;p&gt;Correctness has always been a top priority for the etcd project. In the process of developing 3.6.0, we found and
fixed a few notable bugs that could lead to data inconsistency in specific cases. These fixes have been backported
to previous releases, but we believe they deserve special mention here.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data Inconsistency when Crashing Under Load&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Previously, when etcd was applying data, it would update the consistent-index first, followed by committing the
data. However, these operations were not atomic. If etcd crashed in between, it could lead to data inconsistency
(see &lt;a href=&#34;https://github.com/etcd-io/etcd/issues/13766&#34;&gt;issue/13766&lt;/a&gt;). The issue was introduced in v3.5.0, and fixed in v3.5.3 with &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/13854&#34;&gt;PR/13854&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Durability API guarantee broken in single node cluster&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When a client writes data and receives a success response, the data is expected to be persisted. However, the data might
be lost if etcd crashes immediately after sending the success response to the client. This was a legacy issue (see &lt;a href=&#34;https://github.com/etcd-io/etcd/issues/14370&#34;&gt;issue/14370&lt;/a&gt;)
affecting all previous releases. It was addressed in v3.4.21 and v3.5.5 with &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/14400&#34;&gt;PR/14400&lt;/a&gt;, and fixed in raft side in
main branch (now release-3.6) with &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/14413&#34;&gt;PR/14413&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Revision Inconsistency when Crashing During Defragmentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If etcd crashed during the defragmentation operation, upon restart, it might reapply
some entries which had already been applied, accordingly leading to the revision inconsistency issue
(see the discussions in &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/14685&#34;&gt;PR/14685&lt;/a&gt;). The issue was introduced in v3.5.0, and fixed in v3.5.6 with &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/14730&#34;&gt;PR/14730&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;upgrade-issue&#34;&gt;Upgrade issue&lt;/h2&gt;
&lt;p&gt;This section highlights a common issue &lt;a href=&#34;https://github.com/etcd-io/etcd/issues/19557&#34;&gt;issues/19557&lt;/a&gt; in the etcd v3.5 to v3.6 upgrade that may cause the upgrade
process to fail. For a complete upgrade guide, refer to &lt;a href=&#34;https://etcd.io/docs/v3.6/upgrades/upgrade_3_6/&#34;&gt;Upgrade etcd from v3.5 to v3.6&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The issue was introduced in etcd v3.5.1, and resolved in v3.5.20.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key takeaway&lt;/strong&gt;: users are required to first upgrade to etcd v3.5.20 (or a higher patch version) before upgrading
to etcd v3.6.0; otherwise, the upgrade may fail.&lt;/p&gt;
&lt;p&gt;For more background and technical context, see &lt;a href=&#34;https://etcd.io/blog/2025/upgrade_from_3.5_to_3.6_issue/&#34;&gt;upgrade_from_3.5_to_3.6_issue&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;testing&#34;&gt;Testing&lt;/h2&gt;
&lt;p&gt;We introduced the &lt;a href=&#34;https://github.com/etcd-io/etcd/tree/main/tests/robustness&#34;&gt;Robustness testing&lt;/a&gt; to verify correctness, which has always been our top priority.
It plays traffic of various types and volumes against an etcd cluster, concurrently injects a random
failpoint, records all operations (including both requests and responses), and finally performs a
linearizability check. It also verifies that the &lt;a href=&#34;https://etcd.io/docs/v3.5/learning/api_guarantees/#watch-apis&#34;&gt;Watch APIs&lt;/a&gt; guarantees have not been violated.
The robustness test increases our confidence in ensuring the quality of each etcd release.&lt;/p&gt;
&lt;p&gt;We have migrated most of the etcd workflow tests to Kubernetes&#39; Prow testing infrastructure to
take advantage of its benefit, such as nice dashboards for viewing test results and the ability
for contributors to rerun failed tests themselves.&lt;/p&gt;
&lt;h2 id=&#34;platforms&#34;&gt;Platforms&lt;/h2&gt;
&lt;p&gt;While retaining all existing supported platforms, we have promoted Linux/ARM64 to Tier 1 support.
For more details, please refer to &lt;a href=&#34;https://github.com/etcd-io/etcd/issues/15951&#34;&gt;issues/15951&lt;/a&gt;. For the complete list of supported platforms,
see &lt;a href=&#34;https://etcd.io/docs/v3.6/op-guide/supported-platform/&#34;&gt;supported-platform&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;dependencies&#34;&gt;Dependencies&lt;/h2&gt;
&lt;h3 id=&#34;dependency-bumping-guide&#34;&gt;Dependency bumping guide&lt;/h3&gt;
&lt;p&gt;We have published an official guide on how to bump dependencies for etcd’s main branch and stable releases.
It also covers how to update the Go version. For more details, please refer to &lt;a href=&#34;https://github.com/etcd-io/etcd/blob/main/Documentation/contributor-guide/dependency_management.md&#34;&gt;dependency_management&lt;/a&gt;.
With this guide available, any contributors can now help with dependency upgrades.&lt;/p&gt;
&lt;h3 id=&#34;core-dependency-updates&#34;&gt;Core Dependency Updates&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/etcd-io/bbolt&#34;&gt;bbolt&lt;/a&gt; and &lt;a href=&#34;https://github.com/etcd-io/raft&#34;&gt;raft&lt;/a&gt; are two core dependencies of etcd.&lt;/p&gt;
&lt;p&gt;Both etcd v3.4 and v3.5 depend on bbolt v1.3, while etcd v3.6 depends on bbolt v1.4.&lt;/p&gt;
&lt;p&gt;For the release-3.4 and release-3.5 branches, raft is included in the etcd repository itself, so etcd v3.4 and v3.5
do not depend on an external raft module. Starting from etcd v3.6, raft was moved to a separate repository (&lt;a href=&#34;https://github.com/etcd-io/raft&#34;&gt;raft&lt;/a&gt;),
and the first standalone raft release is v3.6.0. As a result, etcd v3.6.0 depends on raft v3.6.0.&lt;/p&gt;
&lt;p&gt;Please see the table below for a summary:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;etcd versions&lt;/th&gt;
&lt;th&gt;bbolt versions&lt;/th&gt;
&lt;th&gt;raft versions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;3.4.x&lt;/td&gt;
&lt;td&gt;v1.3.x&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3.5.x&lt;/td&gt;
&lt;td&gt;v1.3.x&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3.6.x&lt;/td&gt;
&lt;td&gt;v1.4.x&lt;/td&gt;
&lt;td&gt;v3.6.x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;grpc-gateway-v2&#34;&gt;grpc-gateway@v2&lt;/h3&gt;
&lt;p&gt;We upgraded &lt;a href=&#34;https://github.com/grpc-ecosystem/grpc-gateway&#34;&gt;grpc-gateway&lt;/a&gt; from v1 to v2 via &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/16595&#34;&gt;PR/16595&lt;/a&gt; in etcd v3.6.0. This is a major step toward
migrating to &lt;a href=&#34;https://github.com/protocolbuffers/protobuf-go&#34;&gt;protobuf-go&lt;/a&gt;, the second major version of the Go protocol buffer API implementation.&lt;/p&gt;
&lt;p&gt;grpc-gateway@v2 is designed to work with &lt;a href=&#34;https://github.com/protocolbuffers/protobuf-go&#34;&gt;protobuf-go&lt;/a&gt;. However, etcd v3.6 still depends on the deprecated
&lt;a href=&#34;https://github.com/gogo/protobuf&#34;&gt;gogo/protobuf&lt;/a&gt;, which is actually protocol buffer v1 implementation. To resolve this incompatibility,
we applied a &lt;a href=&#34;https://github.com/etcd-io/etcd/blob/158b9e0d468d310c3edf4cf13f2458c51b0406fa/scripts/genproto.sh#L151-L184&#34;&gt;patch&lt;/a&gt; to the generated *.pb.gw.go files to convert v1 messages to v2 messages.&lt;/p&gt;
&lt;h3 id=&#34;grpc-ecosystem-go-grpc-middleware-providers-prometheus&#34;&gt;grpc-ecosystem/go-grpc-middleware/providers/prometheus&lt;/h3&gt;
&lt;p&gt;We switched from the deprecated (and archived) &lt;a href=&#34;https://github.com/grpc-ecosystem/go-grpc-prometheus&#34;&gt;grpc-ecosystem/go-grpc-prometheus&lt;/a&gt; to
&lt;a href=&#34;https://github.com/grpc-ecosystem/go-grpc-middleware/tree/main/providers/prometheus&#34;&gt;grpc-ecosystem/go-grpc-middleware/providers/prometheus&lt;/a&gt; via &lt;a href=&#34;https://github.com/etcd-io/etcd/pull/19195&#34;&gt;PR/19195&lt;/a&gt;. This change ensures continued
support and access to the latest features and improvements in the gRPC Prometheus integration.&lt;/p&gt;
&lt;h2 id=&#34;community&#34;&gt;Community&lt;/h2&gt;
&lt;p&gt;There are exciting developments in the etcd community that reflect our ongoing commitment
to strengthening collaboration, improving maintainability, and evolving the project’s governance.&lt;/p&gt;
&lt;h3 id=&#34;etcd-becomes-a-kubernetes-sig&#34;&gt;etcd Becomes a Kubernetes SIG&lt;/h3&gt;
&lt;p&gt;etcd has officially become a Kubernetes Special Interest Group: SIG-etcd. This change reflects
etcd’s critical role as the primary datastore for Kubernetes and establishes a more structured
and transparent home for long-term stewardship and cross-project collaboration. The new SIG
designation will help streamline decision-making, align roadmaps with Kubernetes needs,
and attract broader community involvement.&lt;/p&gt;
&lt;h3 id=&#34;new-contributors-maintainers-and-reviewers&#34;&gt;New contributors, maintainers, and reviewers&lt;/h3&gt;
&lt;p&gt;We’ve seen increasing engagement from contributors, which has resulted in the addition of three new maintainers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/fuweid&#34;&gt;fuweid&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/jmhbnz&#34;&gt;jmhbnz&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wenjiaswe&#34;&gt;wenjiaswe&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Their continued contributions have been instrumental in driving the project forward.&lt;/p&gt;
&lt;p&gt;We also welcome two new reviewers to the project:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/ivanvc&#34;&gt;ivanvc&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/siyuanfoundation&#34;&gt;siyuanfoundation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We appreciate their dedication to code quality and their willingness to take on broader review responsibilities
within the community.&lt;/p&gt;
&lt;p&gt;New release team&lt;/p&gt;
&lt;p&gt;We&#39;ve formed a new release team led by &lt;a href=&#34;https://github.com/ivanvc&#34;&gt;ivanvc&lt;/a&gt; and &lt;a href=&#34;https://github.com/jmhbnz&#34;&gt;jmhbnz&lt;/a&gt;, streamlining the release process by automating
many previously manual steps. Inspired by Kubernetes SIG Release, we&#39;ve adopted several best practices, including
clearly defined release team roles and the introduction of release shadows to support knowledge sharing and team
sustainability. These changes have made our releases smoother and more reliable, allowing us to approach each
release with greater confidence and consistency.&lt;/p&gt;
&lt;h3 id=&#34;introducing-the-etcd-operator-working-group&#34;&gt;Introducing the etcd Operator Working Group&lt;/h3&gt;
&lt;p&gt;To further advance etcd’s operational excellence, we have formed a new working group: &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-etcd-operator&#34;&gt;WG-etcd-operator&lt;/a&gt;.
The working group is dedicated to enabling the automatic and efficient operation of etcd clusters that run in
the Kubernetes environment using an etcd-operator.&lt;/p&gt;
&lt;h2 id=&#34;future-development&#34;&gt;Future Development&lt;/h2&gt;
&lt;p&gt;The legacy v2store has been deprecated since etcd v3.4, and the flag &lt;code&gt;--enable-v2&lt;/code&gt; was removed entirely in v3.6.
This means that starting from v3.6, there is no longer a way to enable or use the v2store. However, etcd still
bootstraps internally from the legacy v2 snapshots. To address this inconsistency, We plan to change etcd to
bootstrap from the v3store and replay the WAL entries based on the &lt;code&gt;consistent-index&lt;/code&gt;. The work is being tracked
in &lt;a href=&#34;https://github.com/etcd-io/etcd/issues/12913&#34;&gt;issues/12913&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One of the most persistent challenges remains the large range of queries from the kube-apiserver, which can
lead to process crashes due to their unpredictable nature. The range stream feature, originally outlined in
the &lt;a href=&#34;https://etcd.io/blog/2021/announcing-etcd-3.5/#future-roadmaps&#34;&gt;v3.5 release blog/Future roadmaps&lt;/a&gt;, remains an idea worth revisiting to address the challenges of large
range queries.&lt;/p&gt;
&lt;p&gt;For more details and upcoming plans, please refer to the &lt;a href=&#34;https://github.com/etcd-io/etcd/blob/main/Documentation/contributor-guide/roadmap.md&#34;&gt;etcd roadmap&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes 1.33: Job&#39;s SuccessPolicy Goes GA</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/15/kubernetes-1-33-jobs-success-policy-goes-ga/</link>
      <pubDate>Thu, 15 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/15/kubernetes-1-33-jobs-success-policy-goes-ga/</guid>
      <description>
        
        
        &lt;p&gt;On behalf of the Kubernetes project, I&#39;m pleased to announce that Job &lt;em&gt;success policy&lt;/em&gt; has graduated to General Availability (GA) as part of the v1.33 release.&lt;/p&gt;
&lt;h2 id=&#34;about-job-s-success-policy&#34;&gt;About Job&#39;s Success Policy&lt;/h2&gt;
&lt;p&gt;In batch workloads, you might want to use leader-follower patterns like &lt;a href=&#34;https://en.wikipedia.org/wiki/Message_Passing_Interface&#34;&gt;MPI&lt;/a&gt;,
in which the leader controls the execution, including the followers&#39; lifecycle.&lt;/p&gt;
&lt;p&gt;In this case, you might want to mark it as succeeded
even if some of the indexes failed. Unfortunately, a leader-follower Kubernetes Job that didn&#39;t use a success policy, in most cases, would have to require &lt;strong&gt;all&lt;/strong&gt; Pods to finish successfully
for that Job to reach an overall succeeded state.&lt;/p&gt;
&lt;p&gt;For Kubernetes Jobs, the API allows you to specify the early exit criteria using the &lt;code&gt;.spec.successPolicy&lt;/code&gt;
field (you can only use the &lt;code&gt;.spec.successPolicy&lt;/code&gt; field for an &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concept/workloads/controllers/job/#completion-mode&#34;&gt;indexed Job&lt;/a&gt;).
Which describes a set of rules either using a list of succeeded indexes for a job, or defining a minimal required size of succeeded indexes.&lt;/p&gt;
&lt;p&gt;This newly stable field is especially valuable for scientific simulation, AI/ML and High-Performance Computing (HPC) batch workloads.
Users in these areas often run numerous experiments and may only need a specific number to complete successfully, rather than requiring all of them to succeed.
In this case, the leader index failure is the only relevant Job exit criteria, and the outcomes for individual follower Pods are handled
only indirectly via the status of the leader index.
Moreover, followers do not know when they can terminate themselves.&lt;/p&gt;
&lt;p&gt;After Job meets any &lt;strong&gt;Success Policy&lt;/strong&gt;, the Job is marked as succeeded, and all Pods are terminated including the running ones.&lt;/p&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;How it works&lt;/h2&gt;
&lt;p&gt;The following excerpt from a Job manifest, using &lt;code&gt;.successPolicy.rules[0].succeededCount&lt;/code&gt;, shows an example of
using a custom success policy:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parallelism&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completionMode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Indexed&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;successPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;succeededCount&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here, the Job is marked as succeeded when one index succeeded regardless of its number.
Additionally, you can constrain index numbers against &lt;code&gt;succeededCount&lt;/code&gt; in &lt;code&gt;.successPolicy.rules[0].succeededCount&lt;/code&gt;
as shown below:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parallelism&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completionMode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Indexed&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;successPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;succeededIndexes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# index of the leader Pod&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;succeededCount&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This example shows that the Job will be marked as succeeded once a Pod with a specific index (Pod index 0) has succeeded.&lt;/p&gt;
&lt;p&gt;Once the Job either reaches one of the &lt;code&gt;successPolicy&lt;/code&gt; rules, or achieves its &lt;code&gt;Complete&lt;/code&gt; criteria based on &lt;code&gt;.spec.completions&lt;/code&gt;,
the Job controller within kube-controller-manager adds the &lt;code&gt;SuccessCriteriaMet&lt;/code&gt; condition to the Job status.
After that, the job-controller initiates cleanup and termination of Pods for Jobs with &lt;code&gt;SuccessCriteriaMet&lt;/code&gt; condition.
Eventually, Jobs obtain &lt;code&gt;Complete&lt;/code&gt; condition when the job-controller finished cleanup and termination.&lt;/p&gt;
&lt;h2 id=&#34;learn-more&#34;&gt;Learn more&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Read the documentation for
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/controllers/job/#success-policy&#34;&gt;success policy&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Read the KEP for the &lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3998-job-success-completion-policy&#34;&gt;Job success/completion policy&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;Get involved&lt;/h2&gt;
&lt;p&gt;This work was led by the Kubernetes
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-batch&#34;&gt;batch working group&lt;/a&gt;
in close collaboration with the
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps&#34;&gt;SIG Apps&lt;/a&gt; community.&lt;/p&gt;
&lt;p&gt;If you are interested in working on new features in the space I recommend
subscribing to our &lt;a href=&#34;https://kubernetes.slack.com/messages/wg-batch&#34;&gt;Slack&lt;/a&gt;
channel and attending the regular community meetings.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Updates to Container Lifecycle</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/14/kubernetes-v1-33-updates-to-container-lifecycle/</link>
      <pubDate>Wed, 14 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/14/kubernetes-v1-33-updates-to-container-lifecycle/</guid>
      <description>
        
        
        &lt;p&gt;Kubernetes v1.33 introduces a few updates to the lifecycle of containers. The Sleep action for container lifecycle hooks now supports a zero sleep duration (feature enabled by default).
There is also alpha support for customizing the stop signal sent to containers when they are being terminated.&lt;/p&gt;
&lt;p&gt;This blog post goes into the details of these new aspects of the container lifecycle, and how you can use them.&lt;/p&gt;
&lt;h2 id=&#34;zero-value-for-sleep-action&#34;&gt;Zero value for Sleep action&lt;/h2&gt;
&lt;p&gt;Kubernetes v1.29 introduced the &lt;code&gt;Sleep&lt;/code&gt; action for container PreStop and PostStart Lifecycle hooks. The Sleep action lets your containers pause for a specified duration after the container is started or before it is terminated. This was needed to provide a straightforward way to manage graceful shutdowns. Before the Sleep action, folks used to run the &lt;code&gt;sleep&lt;/code&gt; command using the exec action in their container lifecycle hooks. If you wanted to do this you&#39;d need to have the binary for the &lt;code&gt;sleep&lt;/code&gt; command in your container image. This is difficult if you&#39;re using third party images.&lt;/p&gt;
&lt;p&gt;The sleep action when it was added initially didn&#39;t have support for a sleep duration of zero seconds. The &lt;code&gt;time.Sleep&lt;/code&gt; which the Sleep action uses under the hood supports a duration of zero seconds. Using a negative or a zero value for the sleep returns immediately, resulting in a no-op. We wanted the same behaviour with the sleep action. This support for the zero duration was later added in v1.32, with the &lt;code&gt;PodLifecycleSleepActionAllowZero&lt;/code&gt; feature gate.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;PodLifecycleSleepActionAllowZero&lt;/code&gt; feature gate has graduated to beta in v1.33, and is now enabled by default.
The original Sleep action for &lt;code&gt;preStop&lt;/code&gt; and &lt;code&gt;postStart&lt;/code&gt; hooks is been enabled by default, starting from Kubernetes v1.30.
With a cluster running Kubernetes v1.33, you are able to set a
zero duration for sleep lifecycle hooks. For a cluster with default configuration, you don&#39;t need
to enable any feature gate to make that possible.&lt;/p&gt;
&lt;h2 id=&#34;container-stop-signals&#34;&gt;Container stop signals&lt;/h2&gt;
&lt;p&gt;Container runtimes such as containerd and CRI-O honor a &lt;code&gt;StopSignal&lt;/code&gt; instruction in the container image definition. This can be used to specify a custom stop signal
that the runtime will used to terminate containers based on that image.
Stop signal configuration was not originally part of the Pod API in Kubernetes.
Until Kubernetes v1.33, the only way to override the stop signal for containers was by rebuilding your container image with the new custom stop signal
(for example, specifying &lt;code&gt;STOPSIGNAL&lt;/code&gt; in a &lt;code&gt;Containerfile&lt;/code&gt; or &lt;code&gt;Dockerfile&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;ContainerStopSignals&lt;/code&gt; feature gate which is newly added in Kubernetes v1.33 adds stop signals to the Kubernetes API. This allows users to specify a custom stop signal in the container spec. Stop signals are added to the API as a new lifecycle along with the existing PreStop and PostStart lifecycle handlers. In order to use this feature, we expect the Pod to have the operating system specified with &lt;code&gt;spec.os.name&lt;/code&gt;. This is enforced so that we can cross-validate the stop signal against the operating system and make sure that the containers in the Pod are created with a valid stop signal for the operating system the Pod is being scheduled to. For Pods scheduled on Windows nodes, only &lt;code&gt;SIGTERM&lt;/code&gt; and &lt;code&gt;SIGKILL&lt;/code&gt; are allowed as valid stop signals. Find the full list of signals supported in Linux nodes &lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2985-L3053&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;default-behaviour&#34;&gt;Default behaviour&lt;/h3&gt;
&lt;p&gt;If a container has a custom stop signal defined in its lifecycle, the container runtime would use the signal defined in the lifecycle to kill the container, given that the container runtime also supports custom stop signals. If there is no custom stop signal defined in the container lifecycle, the runtime would fallback to the stop signal defined in the container image. If there is no stop signal defined in the container image, the default stop signal of the runtime would be used. The default signal is &lt;code&gt;SIGTERM&lt;/code&gt; for both containerd and CRI-O.&lt;/p&gt;
&lt;h3 id=&#34;version-skew&#34;&gt;Version skew&lt;/h3&gt;
&lt;p&gt;For the feature to work as intended, both the versions of Kubernetes and the container runtime should support container stop signals. The changes to the Kuberentes API and kubelet are available in alpha stage from v1.33, which can be enabled with the &lt;code&gt;ContainerStopSignals&lt;/code&gt; feature gate. The container runtime implementations for containerd and CRI-O are still a work in progress and will be rolled out soon.&lt;/p&gt;
&lt;h3 id=&#34;using-container-stop-signals&#34;&gt;Using container stop signals&lt;/h3&gt;
&lt;p&gt;To enable this feature, you need to turn on the &lt;code&gt;ContainerStopSignals&lt;/code&gt; feature gate in both the kube-apiserver and the kubelet. Once you have nodes where the feature gate is turned on, you can create Pods with a StopSignal lifecycle and a valid OS name like so:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;os&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;linux&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;lifecycle&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;stopSignal&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;SIGUSR1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Do note that the &lt;code&gt;SIGUSR1&lt;/code&gt; signal in this example can only be used if the container&#39;s Pod is scheduled to a Linux node. Hence we need to specify &lt;code&gt;spec.os.name&lt;/code&gt; as &lt;code&gt;linux&lt;/code&gt; to be able to use the signal. You will only be able to configure &lt;code&gt;SIGTERM&lt;/code&gt; and &lt;code&gt;SIGKILL&lt;/code&gt; signals if the Pod is being scheduled to a Windows node. You cannot specify a &lt;code&gt;containers[*].lifecycle.stopSignal&lt;/code&gt; if the &lt;code&gt;spec.os.name&lt;/code&gt; field is nil or unset either.&lt;/p&gt;
&lt;h2 id=&#34;how-do-i-get-involved&#34;&gt;How do I get involved?&lt;/h2&gt;
&lt;p&gt;This feature is driven by the &lt;a href=&#34;https://github.com/Kubernetes/community/blob/master/sig-node/README.md&#34;&gt;SIG Node&lt;/a&gt;. If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please reach out to us!&lt;/p&gt;
&lt;p&gt;You can reach SIG Node by several means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Slack: &lt;a href=&#34;https://kubernetes.slack.com/messages/sig-node&#34;&gt;#sig-node&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://groups.google.com/forum/#!forum/kubernetes-sig-node&#34;&gt;Mailing list&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/community/labels/sig%2Fnode&#34;&gt;Open Community Issues/PRs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can also contact me directly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub: @sreeram-venkitesh&lt;/li&gt;
&lt;li&gt;Slack: @sreeram.venkitesh&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Job&#39;s Backoff Limit Per Index Goes GA</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/13/kubernetes-v1-33-jobs-backoff-limit-per-index-goes-ga/</link>
      <pubDate>Tue, 13 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/13/kubernetes-v1-33-jobs-backoff-limit-per-index-goes-ga/</guid>
      <description>
        
        
        &lt;p&gt;In Kubernetes v1.33, the &lt;em&gt;Backoff Limit Per Index&lt;/em&gt; feature reaches general
availability (GA). This blog describes the Backoff Limit Per Index feature and
its benefits.&lt;/p&gt;
&lt;h2 id=&#34;about-backoff-limit-per-index&#34;&gt;About backoff limit per index&lt;/h2&gt;
&lt;p&gt;When you run workloads on Kubernetes, you must consider scenarios where Pod
failures can affect the completion of your workloads. Ideally, your workload
should tolerate transient failures and continue running.&lt;/p&gt;
&lt;p&gt;To achieve failure tolerance in a Kubernetes Job, you can set the
&lt;code&gt;spec.backoffLimit&lt;/code&gt; field. This field specifies the total number of tolerated
failures.&lt;/p&gt;
&lt;p&gt;However, for workloads where every index is considered independent, like
&lt;a href=&#34;https://en.wikipedia.org/wiki/Embarrassingly_parallel&#34;&gt;embarassingly parallel&lt;/a&gt;
workloads - the &lt;code&gt;spec.backoffLimit&lt;/code&gt; field is often not flexible enough.
For example, you may choose to run multiple suites of integration tests by
representing each suite as an index within an &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/job/indexed-parallel-processing-static/&#34;&gt;Indexed Job&lt;/a&gt;.
In that setup, a fast-failing index  (test suite) is likely to consume your
entire budget for tolerating Pod failures, and you might not be able to run the
other indexes.&lt;/p&gt;
&lt;p&gt;In order to address this limitation, Kubernetes introduced &lt;em&gt;backoff limit per index&lt;/em&gt;,
which allows you to control the number of retries per index.&lt;/p&gt;
&lt;h2 id=&#34;how-backoff-limit-per-index-works&#34;&gt;How backoff limit per index works&lt;/h2&gt;
&lt;p&gt;To use Backoff Limit Per Index for Indexed Jobs, specify the number of tolerated
Pod failures per index with the &lt;code&gt;spec.backoffLimitPerIndex&lt;/code&gt; field. When you set
this field, the Job executes all indexes by default.&lt;/p&gt;
&lt;p&gt;Additionally, to fine-tune the error handling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Specify the cap on the total number of failed indexes by setting the
&lt;code&gt;spec.maxFailedIndexes&lt;/code&gt; field. When the limit is exceeded the entire Job is
terminated.&lt;/li&gt;
&lt;li&gt;Define a short-circuit to detect a failed index by using the &lt;code&gt;FailIndex&lt;/code&gt; action in the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/controllers/job/#pod-failure-policy&#34;&gt;Pod Failure Policy&lt;/a&gt;
mechanism.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When the number of tolerated failures is exceeded, the Job marks that index as
failed and lists it in the Job&#39;s &lt;code&gt;status.failedIndexes&lt;/code&gt; field.&lt;/p&gt;
&lt;h3 id=&#34;example&#34;&gt;Example&lt;/h3&gt;
&lt;p&gt;The following Job spec snippet is an example of how to combine backoff limit per
index with the &lt;em&gt;Pod Failure Policy&lt;/em&gt; feature:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parallelism&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completionMode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Indexed&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backoffLimitPerIndex&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;maxFailedIndexes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;podFailurePolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;action&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Ignore&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;onPodConditions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;DisruptionTarget&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;action&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;FailIndex&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;onExitCodes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;In&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;values&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;42&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this example, the Job handles Pod failures as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ignores any failed Pods that have the built-in
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/disruptions/#pod-disruption-conditions&#34;&gt;disruption condition&lt;/a&gt;,
called &lt;code&gt;DisruptionTarget&lt;/code&gt;. These Pods don&#39;t count towards Job backoff limits.&lt;/li&gt;
&lt;li&gt;Fails the index corresponding to the failed Pod if any of the failed Pod&#39;s
containers finished with the exit code 42 - based on the matching &amp;quot;FailIndex&amp;quot;
rule.&lt;/li&gt;
&lt;li&gt;Retries the first failure of any index, unless the index failed due to the
matching &lt;code&gt;FailIndex&lt;/code&gt; rule.&lt;/li&gt;
&lt;li&gt;Fails the entire Job if the number of failed indexes exceeded 5 (set by the
&lt;code&gt;spec.maxFailedIndexes&lt;/code&gt; field).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;learn-more&#34;&gt;Learn more&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Read the blog post on the closely related feature of Pod Failure Policy &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/08/19/kubernetes-1-31-pod-failure-policy-for-jobs-goes-ga/&#34;&gt;Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;For a hands-on guide to using Pod failure policy, including the use of FailIndex, see
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/job/pod-failure-policy/&#34;&gt;Handling retriable and non-retriable pod failures with Pod failure policy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Read the documentation for
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/controllers/job/#backoff-limit-per-index&#34;&gt;Backoff limit per index&lt;/a&gt; and
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/controllers/job/#pod-failure-policy&#34;&gt;Pod failure policy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Read the KEP for the &lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs&#34;&gt;Backoff Limits Per Index For Indexed Jobs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;Get involved&lt;/h2&gt;
&lt;p&gt;This work was sponsored by the Kubernetes
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-batch&#34;&gt;batch working group&lt;/a&gt;
in close collaboration with the
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps&#34;&gt;SIG Apps&lt;/a&gt; community.&lt;/p&gt;
&lt;p&gt;If you are interested in working on new features in the space we recommend
subscribing to our &lt;a href=&#34;https://kubernetes.slack.com/messages/wg-batch&#34;&gt;Slack&lt;/a&gt;
channel and attending the regular community meetings.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Image Pull Policy the way you always thought it worked!</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/12/kubernetes-v1-33-ensure-secret-pulled-images-alpha/</link>
      <pubDate>Mon, 12 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/12/kubernetes-v1-33-ensure-secret-pulled-images-alpha/</guid>
      <description>
        
        
        &lt;h2 id=&#34;image-pull-policy-the-way-you-always-thought-it-worked&#34;&gt;Image Pull Policy the way you always thought it worked!&lt;/h2&gt;
&lt;p&gt;Some things in Kubernetes are surprising, and the way &lt;code&gt;imagePullPolicy&lt;/code&gt; behaves might
be one of them. Given Kubernetes is all about running pods, it may be peculiar
to learn that there has been a caveat to restricting pod access to authenticated images for
over 10 years in the form of &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/18787&#34;&gt;issue 18787&lt;/a&gt;!
It is an exciting release when you can resolve a ten-year-old issue.&lt;/p&gt;

&lt;div class=&#34;alert alert-info&#34; role=&#34;alert&#34;&gt;&lt;h4 class=&#34;alert-heading&#34;&gt;Note:&lt;/h4&gt;Throughout this blog post, the term &amp;quot;pod credentials&amp;quot; will be used often. In this context,
the term generally encapsulates the authentication material that is available to a pod
to authenticate a container image pull.&lt;/div&gt;

&lt;h2 id=&#34;ifnotpresent-even-if-i-m-not-supposed-to-have-it&#34;&gt;IfNotPresent, even if I&#39;m not supposed to have it&lt;/h2&gt;
&lt;p&gt;The gist of the problem is that the &lt;code&gt;imagePullPolicy: IfNotPresent&lt;/code&gt; strategy has done
precisely what it says, and nothing more. Let&#39;s set up a scenario. To begin, &lt;em&gt;Pod A&lt;/em&gt; in &lt;em&gt;Namespace X&lt;/em&gt; is scheduled to &lt;em&gt;Node 1&lt;/em&gt; and requires &lt;em&gt;image Foo&lt;/em&gt; from a private repository.
For it&#39;s image pull authentication material, the pod references &lt;em&gt;Secret 1&lt;/em&gt; in its &lt;code&gt;imagePullSecrets&lt;/code&gt;. &lt;em&gt;Secret 1&lt;/em&gt; contains the necessary credentials to pull from the private repository. The Kubelet will utilize the credentials from &lt;em&gt;Secret 1&lt;/em&gt; as supplied by &lt;em&gt;Pod A&lt;/em&gt;
and it will pull &lt;em&gt;container image Foo&lt;/em&gt; from the registry.  This is the intended (and secure)
behavior.&lt;/p&gt;
&lt;p&gt;But now things get curious. If &lt;em&gt;Pod B&lt;/em&gt; in &lt;em&gt;Namespace Y&lt;/em&gt; happens to also be scheduled to &lt;em&gt;Node 1&lt;/em&gt;, unexpected (and potentially insecure) things happen. &lt;em&gt;Pod B&lt;/em&gt; may reference the same private image, specifying the &lt;code&gt;IfNotPresent&lt;/code&gt; image pull policy. &lt;em&gt;Pod B&lt;/em&gt; does not reference &lt;em&gt;Secret 1&lt;/em&gt;
(or in our case, any secret) in its &lt;code&gt;imagePullSecrets&lt;/code&gt;. When the Kubelet tries to run the pod, it honors the &lt;code&gt;IfNotPresent&lt;/code&gt; policy. The Kubelet sees that the &lt;em&gt;image Foo&lt;/em&gt; is already present locally, and will provide &lt;em&gt;image Foo&lt;/em&gt; to &lt;em&gt;Pod B&lt;/em&gt;. &lt;em&gt;Pod B&lt;/em&gt; gets to run the image even though it did not provide credentials authorizing it to pull the image in the first place.&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/12/kubernetes-v1-33-ensure-secret-pulled-images-alpha/ensure_secret_image_pulls.svg&#34;
         alt=&#34;Illustration of the process of two pods trying to access a private image, the first one with a pull secret, the second one without it&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;Using a private image pulled by a different pod&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;While &lt;code&gt;IfNotPresent&lt;/code&gt; should not pull &lt;em&gt;image Foo&lt;/em&gt; if it is already present
on the node, it is an incorrect security posture to allow all pods scheduled
to a node to have access to previously pulled private image. These pods were never
authorized to pull the image in the first place.&lt;/p&gt;
&lt;h2 id=&#34;ifnotpresent-but-only-if-i-am-supposed-to-have-it&#34;&gt;IfNotPresent, but only if I am supposed to have it&lt;/h2&gt;
&lt;p&gt;In Kubernetes v1.33, we - SIG Auth and SIG Node - have finally started to address this (really old) problem and getting the verification right! The basic expected behavior is not changed. If
an image is not present, the Kubelet will attempt to pull the image. The credentials each pod supplies will be utilized for this task. This matches behavior prior to 1.33.&lt;/p&gt;
&lt;p&gt;If the image is present, then the behavior of the Kubelet changes. The Kubelet will now
verify the pod&#39;s credentials before allowing the pod to use the image.&lt;/p&gt;
&lt;p&gt;Performance and service stability have been a consideration while revising the feature.
Pods utilizing the same credential will not be required to re-authenticate. This is
also true when pods source credentials from the same Kubernetes Secret object, even
when the credentials are rotated.&lt;/p&gt;
&lt;h2 id=&#34;never-pull-but-use-if-authorized&#34;&gt;Never pull, but use if authorized&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;imagePullPolicy: Never&lt;/code&gt; option does not fetch images. However, if the
container image is already present on the node, any pod attempting to use the private
image will be required to provide credentials, and those credentials require verification.&lt;/p&gt;
&lt;p&gt;Pods utilizing the same credential will not be required to re-authenticate.
Pods that do not supply credentials previously used to successfully pull an
image will not be allowed to use the private image.&lt;/p&gt;
&lt;h2 id=&#34;always-pull-if-authorized&#34;&gt;Always pull, if authorized&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;imagePullPolicy: Always&lt;/code&gt; has always worked as intended. Each time an image
is requested, the request goes to the registry and the registry will perform an authentication
check.&lt;/p&gt;
&lt;p&gt;In the past, forcing the &lt;code&gt;Always&lt;/code&gt; image pull policy via pod admission was the only way to ensure
that your private container images didn&#39;t get reused by other pods on nodes which already pulled the images.&lt;/p&gt;
&lt;p&gt;Fortunately, this was somewhat performant. Only the image manifest was pulled, not the image. However, there was still a cost and a risk. During a new rollout, scale up, or pod restart, the image registry that provided the image MUST be available for the auth check, putting the image registry in the critical path for stability of services running inside of the cluster.&lt;/p&gt;
&lt;h2 id=&#34;how-it-all-works&#34;&gt;How it all works&lt;/h2&gt;
&lt;p&gt;The feature is based on persistent, file-based caches that are present on each of
the nodes. The following is a simplified description of how the feature works.
For the complete version, please see &lt;a href=&#34;https://kep.k8s.io/2535&#34;&gt;KEP-2535&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The process of requesting an image for the first time goes like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A pod requesting an image from a private registry is scheduled to a node.&lt;/li&gt;
&lt;li&gt;The image is not present on the node.&lt;/li&gt;
&lt;li&gt;The Kubelet makes a record of the intention to pull the image.&lt;/li&gt;
&lt;li&gt;The Kubelet extracts credentials from the Kubernetes Secret referenced by the pod
as an image pull secret, and uses them to pull the image from the private registry.&lt;/li&gt;
&lt;li&gt;After the image has been successfully pulled, the Kubelet makes a record of
the successful pull. This record includes details about credentials used
(in the form of a hash) as well as the Secret from which they originated.&lt;/li&gt;
&lt;li&gt;The Kubelet removes the original record of intent.&lt;/li&gt;
&lt;li&gt;The Kubelet retains the record of successful pull for later use.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When future pods scheduled to the same node request the previously pulled private image:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The Kubelet checks the credentials that the new pod provides for the pull.&lt;/li&gt;
&lt;li&gt;If the hash of these credentials, or the source Secret of the credentials match
the hash or source Secret which were recorded for a previous successful pull,
the pod is allowed to use the previously pulled image.&lt;/li&gt;
&lt;li&gt;If the credentials or their source Secret are not found in the records of
successful pulls for that image, the Kubelet will attempt to use
these new credentials to request a pull from the remote registry, triggering
the authorization flow.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;try-it-out&#34;&gt;Try it out&lt;/h2&gt;
&lt;p&gt;In Kubernetes v1.33 we shipped the alpha version of this feature. To give it a spin,
enable the &lt;code&gt;KubeletEnsureSecretPulledImages&lt;/code&gt; feature gate for your 1.33 Kubelets.&lt;/p&gt;
&lt;p&gt;You can learn more about the feature and additional optional configuration on the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/containers/images/#ensureimagepullcredentialverification&#34;&gt;concept page for Images&lt;/a&gt;
in the official Kubernetes documentation.&lt;/p&gt;
&lt;h2 id=&#34;what-s-next&#34;&gt;What&#39;s next?&lt;/h2&gt;
&lt;p&gt;In future releases we are going to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Make this feature work together with &lt;a href=&#34;https://kep.k8s.io/4412&#34;&gt;Projected service account tokens for Kubelet image credential providers&lt;/a&gt; which adds a new, workload-specific source of image pull credentials.&lt;/li&gt;
&lt;li&gt;Write a benchmarking suite to measure the performance of this feature and assess the impact of
any future changes.&lt;/li&gt;
&lt;li&gt;Implement an in-memory caching layer so that we don&#39;t need to read files for each image
pull request.&lt;/li&gt;
&lt;li&gt;Add support for credential expirations, thus forcing previously validated credentials to
be re-authenticated.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;how-to-get-involved&#34;&gt;How to get involved&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/2535&#34;&gt;Reading KEP-2535&lt;/a&gt; is a great way to understand these changes in depth.&lt;/p&gt;
&lt;p&gt;If you are interested in further involvement, reach out to us on the &lt;a href=&#34;https://kubernetes.slack.com/archives/C04UMAUC4UA&#34;&gt;#sig-auth-authenticators-dev&lt;/a&gt; channel
on Kubernetes Slack (for an invitation, visit &lt;a href=&#34;https://slack.k8s.io/&#34;&gt;https://slack.k8s.io/&lt;/a&gt;).
You are also welcome to join the bi-weekly &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-auth/README.md#meetings&#34;&gt;SIG Auth meetings&lt;/a&gt;,
held every other Wednesday.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Streaming List responses</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/09/kubernetes-v1-33-streaming-list-responses/</link>
      <pubDate>Fri, 09 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/09/kubernetes-v1-33-streaming-list-responses/</guid>
      <description>
        
        
        &lt;p&gt;Managing Kubernetes cluster stability becomes increasingly critical as your infrastructure grows. One of the most challenging aspects of operating large-scale clusters has been handling List requests that fetch substantial datasets - a common operation that could unexpectedly impact your cluster&#39;s stability.&lt;/p&gt;
&lt;p&gt;Today, the Kubernetes community is excited to announce a significant architectural improvement: streaming encoding for List responses.&lt;/p&gt;
&lt;h2 id=&#34;the-problem-unnecessary-memory-consumption-with-large-resources&#34;&gt;The problem: unnecessary memory consumption with large resources&lt;/h2&gt;
&lt;p&gt;Current API response encoders just serialize an entire response into a single contiguous memory and perform one &lt;a href=&#34;https://pkg.go.dev/net/http#ResponseWriter.Write&#34;&gt;ResponseWriter.Write&lt;/a&gt; call to transmit data to the client. Despite HTTP/2&#39;s capability to split responses into smaller frames for transmission, the underlying HTTP server continues to hold the complete response data as a single buffer. Even as individual frames are transmitted to the client, the memory associated with these frames cannot be freed incrementally.&lt;/p&gt;
&lt;p&gt;When cluster size grows, the single response body can be substantial - like hundreds of megabytes in size. At large scale, the current approach becomes particularly inefficient, as it prevents incremental memory release during transmission. Imagining that when network congestion occurs, that large response body’s memory block stays active for tens of seconds or even minutes. This limitation leads to unnecessarily high and prolonged memory consumption in the kube-apiserver process. If multiple large List requests occur simultaneously, the cumulative memory consumption can escalate rapidly, potentially leading to an Out-of-Memory (OOM) situation that compromises cluster stability.&lt;/p&gt;
&lt;p&gt;The encoding/json package uses sync.Pool to reuse memory buffers during serialization. While efficient for consistent workloads, this mechanism creates challenges with sporadic large List responses. When processing these large responses, memory pools expand significantly. But due to sync.Pool&#39;s design, these oversized buffers remain reserved after use. Subsequent small List requests continue utilizing these large memory allocations, preventing garbage collection and maintaining persistently high memory consumption in the kube-apiserver even after the initial large responses complete.&lt;/p&gt;
&lt;p&gt;Additionally, &lt;a href=&#34;https://github.com/protocolbuffers/protocolbuffers.github.io/blob/c14731f55296f8c6367faa4f2e55a3d3594544c6/content/programming-guides/techniques.md?plain=1#L39&#34;&gt;Protocol Buffers&lt;/a&gt; are not designed to handle large datasets. But it’s great for handling &lt;strong&gt;individual&lt;/strong&gt; messages within a large data set. This highlights the need for streaming-based approaches that can process and transmit large collections incrementally rather than as monolithic blocks.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;From &lt;a href=&#34;https://protobuf.dev/programming-guides/techniques/&#34;&gt;https://protobuf.dev/programming-guides/techniques/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;streaming-encoder-for-list-responses&#34;&gt;Streaming encoder for List responses&lt;/h2&gt;
&lt;p&gt;The streaming encoding mechanism is specifically designed for List responses, leveraging their common well-defined collection structures. The core idea focuses exclusively on the &lt;strong&gt;Items&lt;/strong&gt; field within collection structures, which represents the bulk of memory consumption in large responses. Rather than encoding the entire &lt;strong&gt;Items&lt;/strong&gt; array as one contiguous memory block, the new streaming encoder processes and transmits each item individually, allowing memory to be freed progressively as frame or chunk is transmitted. As a result, encoding items one by one significantly reduces the memory footprint required by the API server.&lt;/p&gt;
&lt;p&gt;With Kubernetes objects typically limited to 1.5 MiB (from ETCD), streaming encoding keeps memory consumption predictable and manageable regardless of how many objects are in a List response. The result is significantly improved API server stability, reduced memory spikes, and better overall cluster performance - especially in environments where multiple large List operations might occur simultaneously.&lt;/p&gt;
&lt;p&gt;To ensure perfect backward compatibility, the streaming encoder validates Go struct tags rigorously before activation, guaranteeing byte-for-byte consistency with the original encoder. Standard encoding mechanisms process all fields except &lt;strong&gt;Items&lt;/strong&gt;, maintaining identical output formatting throughout. This approach seamlessly supports all Kubernetes List types—from built-in &lt;strong&gt;*List&lt;/strong&gt; objects to Custom Resource &lt;strong&gt;UnstructuredList&lt;/strong&gt; objects - requiring zero client-side modifications or awareness that the underlying encoding method has changed.&lt;/p&gt;
&lt;h2 id=&#34;performance-gains-you-ll-notice&#34;&gt;Performance gains you&#39;ll notice&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reduced Memory Consumption:&lt;/strong&gt; Significantly lowers the memory footprint of the API server when handling large &lt;strong&gt;list&lt;/strong&gt; requests,
especially when dealing with &lt;strong&gt;large resources&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved Scalability:&lt;/strong&gt; Enables the API server to handle more concurrent requests and larger datasets without running out of memory.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Increased Stability:&lt;/strong&gt; Reduces the risk of OOM kills and service disruptions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Efficient Resource Utilization:&lt;/strong&gt; Optimizes memory usage and improves overall resource efficiency.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;benchmark-results&#34;&gt;Benchmark results&lt;/h2&gt;
&lt;p&gt;To validate results Kubernetes has introduced a new &lt;strong&gt;list&lt;/strong&gt; benchmark which executes concurrently 10 &lt;strong&gt;list&lt;/strong&gt; requests each returning 1GB of data.&lt;/p&gt;
&lt;p&gt;The benchmark has showed 20x improvement, reducing memory usage from 70-80GB to 3GB.&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/09/kubernetes-v1-33-streaming-list-responses/results.png&#34;
         alt=&#34;Screenshot of a K8s performance dashboard showing memory usage for benchmark list going down from 60GB to 3GB&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;List benchmark memory usage&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes 1.33: Volume Populators Graduate to GA</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/08/kubernetes-v1-33-volume-populators-ga/</link>
      <pubDate>Thu, 08 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/08/kubernetes-v1-33-volume-populators-ga/</guid>
      <description>
        
        
        &lt;p&gt;Kubernetes &lt;em&gt;volume populators&lt;/em&gt; are now  generally available (GA)! The &lt;code&gt;AnyVolumeDataSource&lt;/code&gt; feature
gate is treated as always enabled for Kubernetes v1.33, which means that users can specify any appropriate
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-resources&#34;&gt;custom resource&lt;/a&gt;
as the data source of a PersistentVolumeClaim (PVC).&lt;/p&gt;
&lt;p&gt;An example of how to use dataSourceRef in PVC:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PersistentVolumeClaim&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;pvc1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;dataSourceRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiGroup&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;provider.example.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Provider&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;provider1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;what-is-new&#34;&gt;What is new&lt;/h2&gt;
&lt;p&gt;There are four major enhancements from beta.&lt;/p&gt;
&lt;h3 id=&#34;populator-pod-is-optional&#34;&gt;Populator Pod is optional&lt;/h3&gt;
&lt;p&gt;During the beta phase, contributors to Kubernetes identified potential resource leaks with PersistentVolumeClaim (PVC) deletion while volume population was in progress; these leaks happened due to limitations in finalizer handling.
Ahead of the graduation to general availability, the Kubernetes project added support to delete temporary resources (PVC prime, etc.) if the original PVC is deleted.&lt;/p&gt;
&lt;p&gt;To accommodate this, we&#39;ve introduced three new plugin-based functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;PopulateFn()&lt;/code&gt;: Executes the provider-specific data population logic.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PopulateCompleteFn()&lt;/code&gt;: Checks if the data population operation has finished successfully.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PopulateCleanupFn()&lt;/code&gt;: Cleans up temporary resources created by the provider-specific functions after data population is completed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A provider example is added in &lt;a href=&#34;https://github.com/kubernetes-csi/lib-volume-populator/tree/master/example&#34;&gt;lib-volume-populator/example&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;mutator-functions-to-modify-the-kubernetes-resources&#34;&gt;Mutator functions to modify the Kubernetes resources&lt;/h3&gt;
&lt;p&gt;For GA, the CSI volume populator controller code gained a &lt;code&gt;MutatorConfig&lt;/code&gt;, allowing the specification of mutator functions to modify Kubernetes resources.
For example, if the PVC prime is not an exact copy of the PVC and you need provider-specific information for the driver, you can include this information in the optional &lt;code&gt;MutatorConfig&lt;/code&gt;.
This allows you to customize the Kubernetes objects in the volume populator.&lt;/p&gt;
&lt;h3 id=&#34;flexible-metric-handling-for-providers&#34;&gt;Flexible metric handling for providers&lt;/h3&gt;
&lt;p&gt;Our beta phase highlighted a new requirement: the need to aggregate metrics not just from lib-volume-populator, but also from other components within the provider&#39;s codebase.&lt;/p&gt;
&lt;p&gt;To address this, SIG Storage introduced a &lt;a href=&#34;https://github.com/kubernetes-csi/lib-volume-populator/blob/8a922a5302fdba13a6c27328ee50e5396940214b/populator-machinery/controller.go#L122&#34;&gt;provider metric manager&lt;/a&gt;.
This enhancement delegates the implementation of metrics logic to the provider itself, rather than relying solely on lib-volume-populator.
This shift provides greater flexibility and control over metrics collection and aggregation, enabling a more comprehensive view of provider performance.&lt;/p&gt;
&lt;h3 id=&#34;clean-up-for-temporary-resources&#34;&gt;Clean up for temporary resources&lt;/h3&gt;
&lt;p&gt;During the beta phase, we identified potential resource leaks with PersistentVolumeClaim (PVC) deletion while volume population was in progress, due to limitations in finalizer handling. We have improved the populator to support the deletion of temporary resources (PVC prime, etc.) if the original PVC is deleted in this GA release.&lt;/p&gt;
&lt;h2 id=&#34;how-to-use-it&#34;&gt;How to use it&lt;/h2&gt;
&lt;p&gt;To try it out, please follow the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2022/05/16/volume-populators-beta/#trying-it-out&#34;&gt;steps&lt;/a&gt; in the previous beta blog.&lt;/p&gt;
&lt;h2 id=&#34;future-directions-and-potential-feature-requests&#34;&gt;Future directions and potential feature requests&lt;/h2&gt;
&lt;p&gt;For next step, there are several potential feature requests for volume populator:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi sync: the current implementation is a one-time unidirectional sync from source to destination. This can be extended to support multiple syncs, enabling periodic syncs or allowing users to sync on demand&lt;/li&gt;
&lt;li&gt;Bidirectional sync: an extension of multi sync above, but making it bidirectional between source and destination&lt;/li&gt;
&lt;li&gt;Populate data with priorities: with a list of different dataSourceRef, populate based on priorities&lt;/li&gt;
&lt;li&gt;Populate data from multiple sources of the same provider: populate multiple different sources to one destination&lt;/li&gt;
&lt;li&gt;Populate data from multiple sources of the different providers: populate multiple different sources to one destination, pipelining different resources’ population&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To ensure we&#39;re building something truly valuable, Kubernetes SIG Storage would love to hear about any specific use cases you have in mind for this feature.
For any inquiries or specific questions related to volume populator, please reach out to the &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;SIG Storage community&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: From Secrets to Service Accounts: Kubernetes Image Pulls Evolved</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/07/kubernetes-v1-33-wi-for-image-pulls/</link>
      <pubDate>Wed, 07 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/07/kubernetes-v1-33-wi-for-image-pulls/</guid>
      <description>
        
        
        &lt;p&gt;Kubernetes has steadily evolved to reduce reliance on long-lived credentials
stored in the API.
A prime example of this shift is the transition of Kubernetes Service Account (KSA) tokens
from long-lived, static tokens to ephemeral, automatically rotated tokens
with OpenID Connect (OIDC)-compliant semantics.
This advancement enables workloads to securely authenticate with external services
without needing persistent secrets.&lt;/p&gt;
&lt;p&gt;However, one major gap remains: &lt;strong&gt;image pull authentication&lt;/strong&gt;.
Today, Kubernetes clusters rely on image pull secrets stored in the API,
which are long-lived and difficult to rotate,
or on node-level kubelet credential providers,
which allow any pod running on a node to access the same credentials.
This presents security and operational challenges.&lt;/p&gt;
&lt;p&gt;To address this, Kubernetes is introducing &lt;strong&gt;Service Account Token Integration
for Kubelet Credential Providers&lt;/strong&gt;, now available in &lt;strong&gt;alpha&lt;/strong&gt;.
This enhancement allows credential providers to use pod-specific service account tokens
to obtain registry credentials, which kubelet can then use for image pulls —
eliminating the need for long-lived image pull secrets.&lt;/p&gt;
&lt;h2 id=&#34;the-problem-with-image-pull-secrets&#34;&gt;The problem with image pull secrets&lt;/h2&gt;
&lt;p&gt;Currently, Kubernetes administrators have two primary options
for handling private container image pulls:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Image pull secrets stored in the Kubernetes API&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;These secrets are often long-lived because they are hard to rotate.&lt;/li&gt;
&lt;li&gt;They must be explicitly attached to a service account or pod.&lt;/li&gt;
&lt;li&gt;Compromise of a pull secret can lead to unauthorized image access.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Kubelet credential providers&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;These providers fetch credentials dynamically at the node level.&lt;/li&gt;
&lt;li&gt;Any pod running on the node can access the same credentials.&lt;/li&gt;
&lt;li&gt;There’s no per-workload isolation, increasing security risks.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Neither approach aligns with the principles of &lt;strong&gt;least privilege&lt;/strong&gt;
or &lt;strong&gt;ephemeral authentication&lt;/strong&gt;, leaving Kubernetes with a security gap.&lt;/p&gt;
&lt;h2 id=&#34;the-solution-service-account-token-integration-for-kubelet-credential-providers&#34;&gt;The solution: Service Account token integration for Kubelet credential providers&lt;/h2&gt;
&lt;p&gt;This new enhancement enables kubelet credential providers
to use &lt;strong&gt;workload identity&lt;/strong&gt; when fetching image registry credentials.
Instead of relying on long-lived secrets, credential providers can use
service account tokens to request short-lived credentials
tied to a specific pod’s identity.&lt;/p&gt;
&lt;p&gt;This approach provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Workload-specific authentication&lt;/strong&gt;:
Image pull credentials are scoped to a particular workload.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ephemeral credentials&lt;/strong&gt;:
Tokens are automatically rotated, eliminating the risks of long-lived secrets.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Seamless integration&lt;/strong&gt;:
Works with existing Kubernetes authentication mechanisms,
aligning with cloud-native security best practices.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;How it works&lt;/h2&gt;
&lt;h3 id=&#34;1-service-account-tokens-for-credential-providers&#34;&gt;1. Service Account tokens for credential providers&lt;/h3&gt;
&lt;p&gt;Kubelet generates &lt;strong&gt;short-lived, automatically rotated&lt;/strong&gt; tokens for service accounts
if the credential provider it communicates with has opted into receiving
a service account token for image pulls.
These tokens conform to OIDC ID token semantics
and are provided to the credential provider
as part of the &lt;code&gt;CredentialProviderRequest&lt;/code&gt;.
The credential provider can then use this token
to authenticate with an external service.&lt;/p&gt;
&lt;h3 id=&#34;2-image-registry-authentication-flow&#34;&gt;2. Image registry authentication flow&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;When a pod starts, the kubelet requests credentials from a &lt;strong&gt;credential provider&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;If the credential provider has opted in,
the kubelet generates a &lt;strong&gt;service account token&lt;/strong&gt; for the pod.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;service account token is included in the &lt;code&gt;CredentialProviderRequest&lt;/code&gt;&lt;/strong&gt;,
allowing the credential provider to authenticate
and exchange it for &lt;strong&gt;temporary image pull credentials&lt;/strong&gt;
from a registry (e.g. AWS ECR, GCP Artifact Registry, Azure ACR).&lt;/li&gt;
&lt;li&gt;The kubelet then uses these credentials
to pull images on behalf of the pod.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;benefits-of-this-approach&#34;&gt;Benefits of this approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Security&lt;/strong&gt;:
Eliminates long-lived image pull secrets, reducing attack surfaces.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Granular Access Control&lt;/strong&gt;:
Credentials are tied to individual workloads rather than entire nodes or clusters.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational Simplicity&lt;/strong&gt;:
No need for administrators to manage and rotate image pull secrets manually.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved Compliance&lt;/strong&gt;:
Helps organizations meet security policies
that prohibit persistent credentials in the cluster.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;what-s-next&#34;&gt;What&#39;s next?&lt;/h2&gt;
&lt;p&gt;For Kubernetes &lt;strong&gt;v1.34&lt;/strong&gt;, we expect to ship this feature in &lt;strong&gt;beta&lt;/strong&gt;
while continuing to gather feedback from users.&lt;/p&gt;
&lt;p&gt;In the coming releases, we will focus on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implementing &lt;strong&gt;caching mechanisms&lt;/strong&gt;
to improve performance for token generation.&lt;/li&gt;
&lt;li&gt;Giving more &lt;strong&gt;flexibility to credential providers&lt;/strong&gt;
to decide how the registry credentials returned to the kubelet are cached.&lt;/li&gt;
&lt;li&gt;Making the feature work with
&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2535-ensure-secret-pulled-images&#34;&gt;Ensure Secret Pulled Images&lt;/a&gt;
to ensure pods that use an image
are authorized to access that image
when service account tokens are used for authentication.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can learn more about this feature
on the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/administer-cluster/kubelet-credential-provider/#service-account-token-for-image-pulls&#34;&gt;service account token for image pulls&lt;/a&gt;
page in the Kubernetes documentation.&lt;/p&gt;
&lt;p&gt;You can also follow along on the
&lt;a href=&#34;https://kep.k8s.io/4412&#34;&gt;KEP-4412&lt;/a&gt;
to track progress across the coming Kubernetes releases.&lt;/p&gt;
&lt;h2 id=&#34;try-it-out&#34;&gt;Try it out&lt;/h2&gt;
&lt;p&gt;To try out this feature:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Ensure you are running Kubernetes v1.33 or later&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enable the &lt;code&gt;ServiceAccountTokenForKubeletCredentialProviders&lt;/code&gt; feature gate&lt;/strong&gt;
on the kubelet.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ensure credential provider support&lt;/strong&gt;:
Modify or update your credential provider
to use service account tokens for authentication.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Update the credential provider configuration&lt;/strong&gt;
to opt into receiving service account tokens
for the credential provider by configuring the &lt;code&gt;tokenAttributes&lt;/code&gt; field.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deploy a pod&lt;/strong&gt;
that uses the credential provider to pull images from a private registry.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We would love to hear your feedback on this feature.
Please reach out to us on the
&lt;a href=&#34;https://kubernetes.slack.com/archives/C04UMAUC4UA&#34;&gt;#sig-auth-authenticators-dev&lt;/a&gt;
channel on Kubernetes Slack
(for an invitation, visit &lt;a href=&#34;https://slack.k8s.io/&#34;&gt;https://slack.k8s.io/&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&#34;how-to-get-involved&#34;&gt;How to get involved&lt;/h2&gt;
&lt;p&gt;If you are interested in getting involved
in the development of this feature,
sharing feedback, or participating in any other ongoing &lt;strong&gt;SIG Auth&lt;/strong&gt; projects,
please reach out on the
&lt;a href=&#34;https://kubernetes.slack.com/archives/C0EN96KUY&#34;&gt;#sig-auth&lt;/a&gt;
channel on Kubernetes Slack.&lt;/p&gt;
&lt;p&gt;You are also welcome to join the bi-weekly
&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-auth/README.md#meetings&#34;&gt;SIG Auth meetings&lt;/a&gt;,
held every other Wednesday.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Fine-grained SupplementalGroups Control Graduates to Beta</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/06/kubernetes-v1-33-fine-grained-supplementalgroups-control-beta/</link>
      <pubDate>Tue, 06 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/06/kubernetes-v1-33-fine-grained-supplementalgroups-control-beta/</guid>
      <description>
        
        
        &lt;p&gt;The new field, &lt;code&gt;supplementalGroupsPolicy&lt;/code&gt;, was introduced as an opt-in alpha feature for Kubernetes v1.31 and has graduated to beta in v1.33; the corresponding feature gate (&lt;code&gt;SupplementalGroupsPolicy&lt;/code&gt;) is now enabled by default. This feature enables to implement more precise control over supplemental groups in containers that can strengthen the security posture, particularly in accessing volumes. Moreover, it also enhances the transparency of UID/GID details in containers, offering improved security oversight.&lt;/p&gt;
&lt;p&gt;Please be aware that this beta release contains some behavioral breaking change. See &lt;a href=&#34;#the-behavioral-changes-introduced-in-beta&#34;&gt;The Behavioral Changes Introduced In Beta&lt;/a&gt; and &lt;a href=&#34;#upgrade-consideration&#34;&gt;Upgrade Considerations&lt;/a&gt; sections for details.&lt;/p&gt;
&lt;h2 id=&#34;motivation-implicit-group-memberships-defined-in-etc-group-in-the-container-image&#34;&gt;Motivation: Implicit group memberships defined in &lt;code&gt;/etc/group&lt;/code&gt; in the container image&lt;/h2&gt;
&lt;p&gt;Although the majority of Kubernetes cluster admins/users may not be aware, kubernetes, by default, &lt;em&gt;merges&lt;/em&gt; group information from the Pod with information defined in &lt;code&gt;/etc/group&lt;/code&gt; in the container image.&lt;/p&gt;
&lt;p&gt;Let&#39;s see an example, below Pod manifest specifies &lt;code&gt;runAsUser=1000&lt;/code&gt;, &lt;code&gt;runAsGroup=3000&lt;/code&gt; and &lt;code&gt;supplementalGroups=4000&lt;/code&gt; in the Pod&#39;s security context.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;implicit-groups&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;securityContext&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;runAsUser&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;runAsGroup&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;3000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;supplementalGroups&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#666&#34;&gt;4000&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ctr&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;registry.k8s.io/e2e-test-images/agnhost:2.45&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sh&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;-c&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sleep 1h&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;securityContext&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowPrivilegeEscalation&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;false&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What is the result of &lt;code&gt;id&lt;/code&gt; command in the &lt;code&gt;ctr&lt;/code&gt; container? The output should be similar to this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-none&#34; data-lang=&#34;none&#34;&gt;uid=1000 gid=3000 groups=3000,4000,50000
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Where does group ID &lt;code&gt;50000&lt;/code&gt; in supplementary groups (&lt;code&gt;groups&lt;/code&gt; field) come from, even though &lt;code&gt;50000&lt;/code&gt; is not defined in the Pod&#39;s manifest at all? The answer is &lt;code&gt;/etc/group&lt;/code&gt; file in the container image.&lt;/p&gt;
&lt;p&gt;Checking the contents of &lt;code&gt;/etc/group&lt;/code&gt; in the container image should show below:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-none&#34; data-lang=&#34;none&#34;&gt;user-defined-in-image:x:1000:
group-defined-in-image:x:50000:user-defined-in-image
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This shows that the container&#39;s primary user &lt;code&gt;1000&lt;/code&gt; belongs to the group &lt;code&gt;50000&lt;/code&gt; in the last entry.&lt;/p&gt;
&lt;p&gt;Thus, the group membership defined in &lt;code&gt;/etc/group&lt;/code&gt; in the container image for the container&#39;s primary user is &lt;em&gt;implicitly&lt;/em&gt; merged to the information from the Pod. Please note that this was a design decision the current CRI implementations inherited from Docker, and the community never really reconsidered it until now.&lt;/p&gt;
&lt;h3 id=&#34;what-s-wrong-with-it&#34;&gt;What&#39;s wrong with it?&lt;/h3&gt;
&lt;p&gt;The &lt;em&gt;implicitly&lt;/em&gt; merged group information from &lt;code&gt;/etc/group&lt;/code&gt; in the container image poses a security risk. These implicit GIDs can&#39;t be detected or validated by policy engines because there&#39;s no record of them in the Pod manifest. This can lead to unexpected access control issues, particularly when accessing volumes (see &lt;a href=&#34;https://issue.k8s.io/112879&#34;&gt;kubernetes/kubernetes#112879&lt;/a&gt; for details) because file permission is controlled by UID/GIDs in Linux.&lt;/p&gt;
&lt;h2 id=&#34;fine-grained-supplemental-groups-control-in-a-pod-supplementarygroupspolicy&#34;&gt;Fine-grained supplemental groups control in a Pod: &lt;code&gt;supplementaryGroupsPolicy&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;To tackle the above problem, Pod&#39;s &lt;code&gt;.spec.securityContext&lt;/code&gt; now includes &lt;code&gt;supplementalGroupsPolicy&lt;/code&gt; field.&lt;/p&gt;
&lt;p&gt;This field lets you control how Kubernetes calculates the supplementary groups for container processes within a Pod. The available policies are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Merge&lt;/em&gt;: The group membership defined in &lt;code&gt;/etc/group&lt;/code&gt; for the container&#39;s primary user will be merged. If not specified, this policy will be applied (i.e. as-is behavior for backward compatibility).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Strict&lt;/em&gt;: Only the group IDs specified in &lt;code&gt;fsGroup&lt;/code&gt;, &lt;code&gt;supplementalGroups&lt;/code&gt;, or &lt;code&gt;runAsGroup&lt;/code&gt; are attached as supplementary groups to the container processes. Group memberships defined in &lt;code&gt;/etc/group&lt;/code&gt; for the container&#39;s primary user are ignored.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&#39;s see how &lt;code&gt;Strict&lt;/code&gt; policy works. Below Pod manifest specifies &lt;code&gt;supplementalGroupsPolicy: Strict&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;strict-supplementalgroups-policy&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;securityContext&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;runAsUser&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;runAsGroup&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;3000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;supplementalGroups&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#666&#34;&gt;4000&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;supplementalGroupsPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Strict&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ctr&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;registry.k8s.io/e2e-test-images/agnhost:2.45&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sh&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;-c&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sleep 1h&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;securityContext&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowPrivilegeEscalation&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;false&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The result of &lt;code&gt;id&lt;/code&gt; command in the &lt;code&gt;ctr&lt;/code&gt; container should be similar to this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-none&#34; data-lang=&#34;none&#34;&gt;uid=1000 gid=3000 groups=3000,4000
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can see &lt;code&gt;Strict&lt;/code&gt; policy can exclude group &lt;code&gt;50000&lt;/code&gt; from &lt;code&gt;groups&lt;/code&gt;!&lt;/p&gt;
&lt;p&gt;Thus, ensuring &lt;code&gt;supplementalGroupsPolicy: Strict&lt;/code&gt; (enforced by some policy mechanism) helps prevent the implicit supplementary groups in a Pod.&lt;/p&gt;

&lt;div class=&#34;alert alert-info&#34; role=&#34;alert&#34;&gt;&lt;h4 class=&#34;alert-heading&#34;&gt;Note:&lt;/h4&gt;A container with sufficient privileges can change its process identity. The &lt;code&gt;supplementalGroupsPolicy&lt;/code&gt; only affect the initial process identity. See the following section for details.&lt;/div&gt;

&lt;h2 id=&#34;attached-process-identity-in-pod-status&#34;&gt;Attached process identity in Pod status&lt;/h2&gt;
&lt;p&gt;This feature also exposes the process identity attached to the first container process of the container
via &lt;code&gt;.status.containerStatuses[].user.linux&lt;/code&gt; field. It would be helpful to see if implicit group IDs are attached.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;...&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;status&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerStatuses&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ctr&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;user&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;linux&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;gid&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;3000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;supplementalGroups&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#666&#34;&gt;3000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#666&#34;&gt;4000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;uid&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;...&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&#34;alert alert-info&#34; role=&#34;alert&#34;&gt;&lt;h4 class=&#34;alert-heading&#34;&gt;Note:&lt;/h4&gt;Please note that the values in &lt;code&gt;status.containerStatuses[].user.linux&lt;/code&gt; field is &lt;em&gt;the firstly attached&lt;/em&gt;
process identity to the first container process in the container. If the container has sufficient privilege
to call system calls related to process identity (e.g. &lt;a href=&#34;https://man7.org/linux/man-pages/man2/setuid.2.html&#34;&gt;&lt;code&gt;setuid(2)&lt;/code&gt;&lt;/a&gt;, &lt;a href=&#34;https://man7.org/linux/man-pages/man2/setgid.2.html&#34;&gt;&lt;code&gt;setgid(2)&lt;/code&gt;&lt;/a&gt; or &lt;a href=&#34;https://man7.org/linux/man-pages/man2/setgroups.2.html&#34;&gt;&lt;code&gt;setgroups(2)&lt;/code&gt;&lt;/a&gt;, etc.), the container process can change its identity. Thus, the &lt;em&gt;actual&lt;/em&gt; process identity will be dynamic.&lt;/div&gt;

&lt;h2 id=&#34;strict-policy-requires-newer-cri-versions&#34;&gt;&lt;code&gt;Strict&lt;/code&gt; Policy requires newer CRI versions&lt;/h2&gt;
&lt;p&gt;Actually, CRI runtime (e.g. containerd, CRI-O) plays a core role for calculating supplementary group ids to be attached to the containers. Thus, &lt;code&gt;SupplementalGroupsPolicy=Strict&lt;/code&gt; requires a CRI runtime that support this feature (&lt;code&gt;SupplementalGroupsPolicy: Merge&lt;/code&gt; can work with the CRI runtime which does not support this feature because this policy is fully backward compatible policy).&lt;/p&gt;
&lt;p&gt;Here are some CRI runtimes that support this feature, and the versions you need
to be running:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;containerd: v2.0 or later&lt;/li&gt;
&lt;li&gt;CRI-O: v1.31 or later&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And, you can see if the feature is supported in the Node&#39;s &lt;code&gt;.status.features.supplementalGroupsPolicy&lt;/code&gt; field.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Node&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;...&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;status&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;features&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;supplementalGroupsPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;the-behavioral-changes-introduced-in-beta&#34;&gt;The behavioral changes introduced in beta&lt;/h2&gt;
&lt;p&gt;In the alpha release, when a Pod with &lt;code&gt;supplementalGroupsPolicy: Strict&lt;/code&gt; was scheduled to a node that did not support the feature (i.e., &lt;code&gt;.status.features.supplementalGroupsPolicy=false&lt;/code&gt;), the Pod&#39;s supplemental groups policy silently fell back to &lt;code&gt;Merge&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In v1.33, this has entered beta to enforce the policy more strictly, where kubelet rejects pods whose nodes cannot ensure the specified policy. If your pod is rejected, you will see warning events with &lt;code&gt;reason=SupplementalGroupsPolicyNotSupported&lt;/code&gt; like below:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Event&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;...&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Warning&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;reason&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;SupplementalGroupsPolicyNotSupported&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;message&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;SupplementalGroupsPolicy=Strict is not supported in this node&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;involvedObject&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;upgrade-consideration&#34;&gt;Upgrade consideration&lt;/h2&gt;
&lt;p&gt;If you&#39;re already using this feature, especially the &lt;code&gt;supplementalGroupsPolicy: Strict&lt;/code&gt; policy, we assume that your cluster&#39;s CRI runtimes already support this feature. In that case, you don&#39;t need to worry about the pod rejections described above.&lt;/p&gt;
&lt;p&gt;However, if your cluster:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;uses the &lt;code&gt;supplementalGroupsPolicy: Strict&lt;/code&gt; policy, but&lt;/li&gt;
&lt;li&gt;its CRI runtimes do NOT yet support the feature (i.e., &lt;code&gt;.status.features.supplementalGroupsPolicy=false&lt;/code&gt;),&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;you need to prepare the behavioral changes (pod rejection) when upgrading your cluster.&lt;/p&gt;
&lt;p&gt;We recommend several ways to avoid unexpected pod rejections:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Upgrading your cluster&#39;s CRI runtimes together with kubernetes or before the upgrade&lt;/li&gt;
&lt;li&gt;Putting some label to your nodes describing CRI runtime supports this feature or not and also putting label selector to pods with &lt;code&gt;Strict&lt;/code&gt; policy to select such nodes (but, you will need to monitor the number of &lt;code&gt;Pending&lt;/code&gt; pods in this case instead of pod rejections).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;getting-involved&#34;&gt;Getting involved&lt;/h2&gt;
&lt;p&gt;This feature is driven by the &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-node&#34;&gt;SIG Node&lt;/a&gt; community.
Please join us to connect with the community and share your ideas and feedback around the above feature and
beyond. We look forward to hearing from you!&lt;/p&gt;
&lt;h2 id=&#34;how-can-i-learn-more&#34;&gt;How can I learn more?&lt;/h2&gt;
&lt;!-- https://github.com/kubernetes/website/pull/46920 --&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/configure-pod-container/security-context/&#34;&gt;Configure a Security Context for a Pod or Container&lt;/a&gt;
for the further details of &lt;code&gt;supplementalGroupsPolicy&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/3619&#34;&gt;KEP-3619: Fine-grained SupplementalGroups control&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/05/kubernetes-v1-33-prevent-persistentvolume-leaks-when-deleting-out-of-order-graduate-to-ga/</link>
      <pubDate>Mon, 05 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/05/kubernetes-v1-33-prevent-persistentvolume-leaks-when-deleting-out-of-order-graduate-to-ga/</guid>
      <description>
        
        
        &lt;p&gt;I am thrilled to announce that the feature to prevent
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/persistent-volumes/&#34;&gt;PersistentVolume&lt;/a&gt; (or PVs for short)
leaks when deleting out of order has graduated to General Availability (GA) in
Kubernetes v1.33! This improvement, initially introduced as a beta
feature in Kubernetes v1.31, ensures that your storage resources are properly
reclaimed, preventing unwanted leaks.&lt;/p&gt;
&lt;h2 id=&#34;how-did-reclaim-work-in-previous-kubernetes-releases&#34;&gt;How did reclaim work in previous Kubernetes releases?&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/persistent-volumes/#Introduction&#34;&gt;PersistentVolumeClaim&lt;/a&gt; (or PVC for short) is
a user&#39;s request for storage. A PV and PVC are considered &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/persistent-volumes/#Binding&#34;&gt;Bound&lt;/a&gt;
if a newly created PV or a matching PV is found. The PVs themselves are
backed by volumes allocated by the storage backend.&lt;/p&gt;
&lt;p&gt;Normally, if the volume is to be deleted, then the expectation is to delete the
PVC for a bound PV-PVC pair. However, there are no restrictions on deleting a PV
before deleting a PVC.&lt;/p&gt;
&lt;p&gt;For a &lt;code&gt;Bound&lt;/code&gt; PV-PVC pair, the ordering of PV-PVC deletion determines whether
the PV reclaim policy is honored. The reclaim policy is honored if the PVC is
deleted first; however, if the PV is deleted prior to deleting the PVC, then the
reclaim policy is not exercised. As a result of this behavior, the associated
storage asset in the external infrastructure is not removed.&lt;/p&gt;
&lt;h2 id=&#34;pv-reclaim-policy-with-kubernetes-v1-33&#34;&gt;PV reclaim policy with Kubernetes v1.33&lt;/h2&gt;
&lt;p&gt;With the graduation to GA in Kubernetes v1.33, this issue is now resolved. Kubernetes
now reliably honors the configured &lt;code&gt;Delete&lt;/code&gt; reclaim policy, even when PVs are deleted
before their bound PVCs. This is achieved through the use of finalizers,
ensuring that the storage backend releases the allocated storage resource as intended.&lt;/p&gt;
&lt;h3 id=&#34;how-does-it-work&#34;&gt;How does it work?&lt;/h3&gt;
&lt;p&gt;For CSI volumes, the new behavior is achieved by adding a &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/overview/working-with-objects/finalizers/&#34;&gt;finalizer&lt;/a&gt; &lt;code&gt;external-provisioner.volume.kubernetes.io/finalizer&lt;/code&gt;
on new and existing PVs. The finalizer is only removed after the storage from the backend is deleted. Addition or removal of finalizer is handled by &lt;code&gt;external-provisioner&lt;/code&gt;
`&lt;/p&gt;
&lt;p&gt;An example of a PV with the finalizer, notice the new finalizer in the finalizers list&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;kubectl get pv pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 -o yaml
&lt;/code&gt;&lt;/pre&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PersistentVolume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;pv.kubernetes.io/provisioned-by&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;csi.example.driver.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;creationTimestamp&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;2021-11-17T19:28:56Z&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;finalizers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- kubernetes.io/pv-protection&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- external-provisioner.volume.kubernetes.io/finalizer&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resourceVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;194711&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;uid&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;087f14f2-4157-4e95-8a70-8294b039d30e&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;accessModes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- ReadWriteOnce&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;capacity&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storage&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;1Gi&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;claimRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PersistentVolumeClaim&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example-vanilla-block-pvc&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;default&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resourceVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;194677&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;uid&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;a7b7e3ba-f837-45ba-b243-dec7d8aaed53&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;csi&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;driver&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;csi.example.driver.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;fsType&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ext4&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeAttributes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storage.kubernetes.io/csiProvisionerIdentity&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1637110610497-8081&lt;/span&gt;-csi.example.driver.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;CNS Block Volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeHandle&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;2dacf297-803f-4ccc-afc7-3d3c3f02051e&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;persistentVolumeReclaimPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Delete&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storageClassName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example-vanilla-block-sc&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeMode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Filesystem&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;status&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;phase&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Bound&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/overview/working-with-objects/finalizers/&#34;&gt;finalizer&lt;/a&gt; prevents this
PersistentVolume from being removed from the
cluster. As stated previously, the finalizer is only removed from the PV object
after it is successfully deleted from the storage backend. To learn more about
finalizers, please refer to &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2021/05/14/using-finalizers-to-control-deletion/&#34;&gt;Using Finalizers to Control Deletion&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Similarly, the finalizer &lt;code&gt;kubernetes.io/pv-controller&lt;/code&gt; is added to dynamically provisioned in-tree plugin volumes.&lt;/p&gt;
&lt;h3 id=&#34;important-note&#34;&gt;Important note&lt;/h3&gt;
&lt;p&gt;The fix does not apply to statically provisioned in-tree plugin volumes.&lt;/p&gt;
&lt;h2 id=&#34;how-to-enable-new-behavior&#34;&gt;How to enable new behavior?&lt;/h2&gt;
&lt;p&gt;To take advantage of the new behavior, you must have upgraded your cluster to the v1.33 release of Kubernetes
and run the CSI &lt;a href=&#34;https://github.com/kubernetes-csi/external-provisioner&#34;&gt;&lt;code&gt;external-provisioner&lt;/code&gt;&lt;/a&gt; version &lt;code&gt;5.0.1&lt;/code&gt; or later.
The feature was released as beta in v1.31 release of Kubernetes, where it was enabled by default.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2644-honor-pv-reclaim-policy&#34;&gt;KEP-2644&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes-csi/external-provisioner/issues/546&#34;&gt;Volume leak issue&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/08/16/kubernetes-1-31-prevent-persistentvolume-leaks-when-deleting-out-of-order/&#34;&gt;Beta Release Blog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-do-i-get-involved&#34;&gt;How do I get involved?&lt;/h2&gt;
&lt;p&gt;The Kubernetes Slack channel &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-storage/README.md#contact&#34;&gt;SIG Storage communication channels&lt;/a&gt; are great mediums to reach out to the SIG Storage and migration working group teams.&lt;/p&gt;
&lt;p&gt;Special thanks to the following people for the insightful reviews, thorough consideration and valuable contribution:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fan Baofa (carlory)&lt;/li&gt;
&lt;li&gt;Jan Šafránek (jsafrane)&lt;/li&gt;
&lt;li&gt;Xing Yang (xing-yang)&lt;/li&gt;
&lt;li&gt;Matthew Wong (wongma7)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Join the &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;Kubernetes Storage Special Interest Group (SIG)&lt;/a&gt; if you&#39;re interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system. We’re rapidly growing and always welcome new contributors.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Mutable CSI Node Allocatable Count</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/02/kubernetes-1-33-mutable-csi-node-allocatable-count/</link>
      <pubDate>Fri, 02 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/02/kubernetes-1-33-mutable-csi-node-allocatable-count/</guid>
      <description>
        
        
        &lt;p&gt;Scheduling stateful applications reliably depends heavily on accurate information about resource availability on nodes.
Kubernetes v1.33 introduces an alpha feature called &lt;em&gt;mutable CSI node allocatable count&lt;/em&gt;, allowing Container Storage Interface (CSI) drivers to dynamically update the reported maximum number of volumes that a node can handle.
This capability significantly enhances the accuracy of pod scheduling decisions and reduces scheduling failures caused by outdated volume capacity information.&lt;/p&gt;
&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;Traditionally, Kubernetes CSI drivers report a static maximum volume attachment limit when initializing. However, actual attachment capacities can change during a node&#39;s lifecycle for various reasons, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Manual or external operations attaching/detaching volumes outside of Kubernetes control.&lt;/li&gt;
&lt;li&gt;Dynamically attached network interfaces or specialized hardware (GPUs, NICs, etc.) consuming available slots.&lt;/li&gt;
&lt;li&gt;Multi-driver scenarios, where one CSI driver’s operations affect available capacity reported by another.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Static reporting can cause Kubernetes to schedule pods onto nodes that appear to have capacity but don&#39;t, leading to pods stuck in a &lt;code&gt;ContainerCreating&lt;/code&gt; state.&lt;/p&gt;
&lt;h2 id=&#34;dynamically-adapting-csi-volume-limits&#34;&gt;Dynamically adapting CSI volume limits&lt;/h2&gt;
&lt;p&gt;With the new feature gate &lt;code&gt;MutableCSINodeAllocatableCount&lt;/code&gt;, Kubernetes enables CSI drivers to dynamically adjust and report node attachment capacities at runtime. This ensures that the scheduler has the most accurate, up-to-date view of node capacity.&lt;/p&gt;
&lt;h3 id=&#34;how-it-works&#34;&gt;How it works&lt;/h3&gt;
&lt;p&gt;When this feature is enabled, Kubernetes supports two mechanisms for updating the reported node volume limits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Periodic Updates:&lt;/strong&gt; CSI drivers specify an interval to periodically refresh the node&#39;s allocatable capacity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reactive Updates:&lt;/strong&gt; An immediate update triggered when a volume attachment fails due to exhausted resources (&lt;code&gt;ResourceExhausted&lt;/code&gt; error).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;enabling-the-feature&#34;&gt;Enabling the feature&lt;/h3&gt;
&lt;p&gt;To use this alpha feature, you must enable the &lt;code&gt;MutableCSINodeAllocatableCount&lt;/code&gt; feature gate in these components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kube-apiserver&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubelet&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;example-csi-driver-configuration&#34;&gt;Example CSI driver configuration&lt;/h3&gt;
&lt;p&gt;Below is an example of configuring a CSI driver to enable periodic updates every 60 seconds:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  name: example.csi.k8s.io
spec:
  nodeAllocatableUpdatePeriodSeconds: 60
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This configuration directs Kubelet to periodically call the CSI driver&#39;s &lt;code&gt;NodeGetInfo&lt;/code&gt; method every 60 seconds, updating the node’s allocatable volume count. Kubernetes enforces a minimum update interval of 10 seconds to balance accuracy and resource usage.&lt;/p&gt;
&lt;h3 id=&#34;immediate-updates-on-attachment-failures&#34;&gt;Immediate updates on attachment failures&lt;/h3&gt;
&lt;p&gt;In addition to periodic updates, Kubernetes now reacts to attachment failures. Specifically, if a volume attachment fails with a &lt;code&gt;ResourceExhausted&lt;/code&gt; error (gRPC code &lt;code&gt;8&lt;/code&gt;), an immediate update is triggered to correct the allocatable count promptly.&lt;/p&gt;
&lt;p&gt;This proactive correction prevents repeated scheduling errors and helps maintain cluster health.&lt;/p&gt;
&lt;h2 id=&#34;getting-started&#34;&gt;Getting started&lt;/h2&gt;
&lt;p&gt;To experiment with mutable CSI node allocatable count in your Kubernetes v1.33 cluster:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Enable the feature gate &lt;code&gt;MutableCSINodeAllocatableCount&lt;/code&gt; on the &lt;code&gt;kube-apiserver&lt;/code&gt; and &lt;code&gt;kubelet&lt;/code&gt; components.&lt;/li&gt;
&lt;li&gt;Update your CSI driver configuration by setting &lt;code&gt;nodeAllocatableUpdatePeriodSeconds&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Monitor and observe improvements in scheduling accuracy and pod placement reliability.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;next-steps&#34;&gt;Next steps&lt;/h2&gt;
&lt;p&gt;This feature is currently in alpha and the Kubernetes community welcomes your feedback. Test it, share your experiences, and help guide its evolution toward beta and GA stability.&lt;/p&gt;
&lt;p&gt;Join discussions in the &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;Kubernetes Storage Special Interest Group (SIG-Storage)&lt;/a&gt; to shape the future of Kubernetes storage capabilities.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: New features in DRA</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/01/kubernetes-v1-33-dra-updates/</link>
      <pubDate>Thu, 01 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/05/01/kubernetes-v1-33-dra-updates/</guid>
      <description>
        
        
        &lt;p&gt;Kubernetes &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/dynamic-resource-allocation/&#34;&gt;Dynamic Resource Allocation&lt;/a&gt; (DRA) was originally introduced as an alpha feature in the v1.26 release, and then went through a significant redesign for Kubernetes v1.31. The main DRA feature went to beta in v1.32, and the project hopes it will be generally available in Kubernetes v1.34.&lt;/p&gt;
&lt;p&gt;The basic feature set of DRA provides a far more powerful and flexible API for requesting devices than Device Plugin. And while DRA remains a beta feature for v1.33, the DRA team has been hard at work implementing a number of new features and UX improvements. One feature has been promoted to beta, while a number of new features have been added in alpha. The team has also made progress towards getting DRA ready for GA.&lt;/p&gt;
&lt;h3 id=&#34;features-promoted-to-beta&#34;&gt;Features promoted to beta&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaim-device-status&#34;&gt;Driver-owned Resource Claim Status&lt;/a&gt; was promoted to beta. This allows the driver to report driver-specific device status data for each allocated device in a resource claim, which is particularly useful for supporting network devices.&lt;/p&gt;
&lt;h3 id=&#34;new-alpha-features&#34;&gt;New alpha features&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#partitionable-devices&#34;&gt;Partitionable Devices&lt;/a&gt; lets a driver advertise several overlapping logical devices (“partitions”), and the driver can reconfigure the physical device dynamically based on the actual devices allocated. This makes it possible to partition devices on-demand to meet the needs of the workloads and therefore increase the utilization.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-taints-and-tolerations&#34;&gt;Device Taints and Tolerations&lt;/a&gt; allow devices to be tainted and for workloads to tolerate those taints. This makes it possible for drivers or cluster administrators to mark devices as unavailable. Depending on the effect of the taint, this can prevent devices from being allocated or cause eviction of pods that are using the device.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#prioritized-list&#34;&gt;Prioritized List&lt;/a&gt; lets users specify a list of acceptable devices for their workloads, rather than just a single type of device. So while the workload might run best on a single high-performance GPU, it might also be able to run on 2 mid-level GPUs. The scheduler will attempt to satisfy the alternatives in the list in order, so the workload will be allocated the best set of devices available in the cluster.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#admin-access&#34;&gt;Admin Access&lt;/a&gt; has been updated so that only users with access to a namespace with the &lt;code&gt;resource.k8s.io/admin-access: &amp;quot;true&amp;quot;&lt;/code&gt; label are authorized to create ResourceClaim or ResourceClaimTemplates objects with the &lt;code&gt;adminAccess&lt;/code&gt; field within the namespace. This grants administrators access to in-use devices and may enable additional permissions when making the device available in a container. This ensures that non-admin users cannot misuse the feature.&lt;/p&gt;
&lt;h3 id=&#34;preparing-for-general-availability&#34;&gt;Preparing for general availability&lt;/h3&gt;
&lt;p&gt;A new v1beta2 API has been added to simplify the user experience and to prepare for additional features being added in the future. The RBAC rules for DRA have been improved and support has been added for seamless upgrades of DRA drivers.&lt;/p&gt;
&lt;h3 id=&#34;what-s-next&#34;&gt;What’s next?&lt;/h3&gt;
&lt;p&gt;The plan for v1.34 is even more ambitious than for v1.33. Most importantly, we (the Kubernetes device management working group) hope to bring DRA to general availability, which will make it available by default on all v1.34 Kubernetes clusters. This also means that many, perhaps all, of the DRA features that are still beta in v1.34 will become enabled by default, making it much easier to use them.&lt;/p&gt;
&lt;p&gt;The alpha features that were added in v1.33 will be brought to beta in v1.34.&lt;/p&gt;
&lt;h3 id=&#34;getting-involved&#34;&gt;Getting involved&lt;/h3&gt;
&lt;p&gt;A good starting point is joining the WG Device Management &lt;a href=&#34;https://kubernetes.slack.com/archives/C0409NGC1TK&#34;&gt;Slack channel&lt;/a&gt; and &lt;a href=&#34;https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?tab=t.0#heading=h.tgg8gganowxq&#34;&gt;meetings&lt;/a&gt;, which happen at US/EU and EU/APAC friendly time slots.&lt;/p&gt;
&lt;p&gt;Not all enhancement ideas are tracked as issues yet, so come talk to us if you want to help or have some ideas yourself! We have work to do at all levels, from difficult core changes to usability enhancements in kubectl, which could be picked up by newcomers.&lt;/p&gt;
&lt;h3 id=&#34;acknowledgments&#34;&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;A huge thanks to everyone who has contributed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cici Huang (&lt;a href=&#34;https://github.com/cici37&#34;&gt;cici37&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Ed Bartosh (&lt;a href=&#34;https://github.com/bart0sh]&#34;&gt;bart0sh&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;John Belamaric (&lt;a href=&#34;https://github.com/johnbelamaric&#34;&gt;johnbelamaric&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Jon Huhn (&lt;a href=&#34;https://github.com/nojnhuh&#34;&gt;nojnhuh&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Kevin Klues (&lt;a href=&#34;https://github.com/klueska&#34;&gt;klueska&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Morten Torkildsen (&lt;a href=&#34;https://github.com/mortent&#34;&gt;mortent&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Patrick Ohly (&lt;a href=&#34;https://github.com/pohly&#34;&gt;pohly&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Rita Zhang (&lt;a href=&#34;https://github.com/ritazh&#34;&gt;ritazh&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Shingo Omura (&lt;a href=&#34;https://github.com/everpeace&#34;&gt;everpeace&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Storage Capacity Scoring of Nodes for Dynamic Provisioning (alpha)</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/30/kubernetes-v1-33-storage-capacity-scoring-feature/</link>
      <pubDate>Wed, 30 Apr 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/30/kubernetes-v1-33-storage-capacity-scoring-feature/</guid>
      <description>
        
        
        &lt;p&gt;Kubernetes v1.33 introduces a new alpha feature called &lt;code&gt;StorageCapacityScoring&lt;/code&gt;. This feature adds a scoring method for pod scheduling
with &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/&#34;&gt;the topology-aware volume provisioning&lt;/a&gt;.
This feature eases to schedule pods on nodes with either the most or least available storage capacity.&lt;/p&gt;
&lt;h2 id=&#34;about-this-feature&#34;&gt;About this feature&lt;/h2&gt;
&lt;p&gt;This feature extends the kube-scheduler&#39;s VolumeBinding plugin to perform scoring using node storage capacity information
obtained from &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/storage-capacity/&#34;&gt;Storage Capacity&lt;/a&gt;. Currently, you can only filter out nodes with insufficient storage capacity.
So, you have to use a scheduler extender to achieve storage-capacity-based pod scheduling.&lt;/p&gt;
&lt;p&gt;This feature is useful for provisioning node-local PVs, which have size limits based on the node&#39;s storage capacity. By using this feature,
you can assign the PVs to the nodes with the most available storage space so that you can expand the PVs later as much as possible.&lt;/p&gt;
&lt;p&gt;In another use case, you might want to reduce the number of nodes as much as possible for low operation costs in cloud environments by choosing
the least storage capacity node. This feature helps maximize resource utilization by filling up nodes more sequentially, starting with the most
utilized nodes first that still have enough storage capacity for the requested volume size.&lt;/p&gt;
&lt;h2 id=&#34;how-to-use&#34;&gt;How to use&lt;/h2&gt;
&lt;h3 id=&#34;enabling-the-feature&#34;&gt;Enabling the feature&lt;/h3&gt;
&lt;p&gt;In the alpha phase, &lt;code&gt;StorageCapacityScoring&lt;/code&gt; is disabled by default. To use this feature, add &lt;code&gt;StorageCapacityScoring=true&lt;/code&gt;
to the kube-scheduler command line option &lt;code&gt;--feature-gates&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;configuration-changes&#34;&gt;Configuration changes&lt;/h3&gt;
&lt;p&gt;You can configure node priorities based on storage utilization using the &lt;code&gt;shape&lt;/code&gt; parameter in the VolumeBinding plugin configuration.
This allows you to prioritize nodes with higher available storage capacity (default) or, conversely, nodes with lower available storage capacity.
For example, to prioritize lower available storage capacity, configure &lt;code&gt;KubeSchedulerConfiguration&lt;/code&gt; as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kubescheduler.config.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;KubeSchedulerConfiguration&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;profiles&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;pluginConfig&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;VolumeBinding&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;args&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;shape&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;utilization&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;score&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;utilization&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;100&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;score&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For more details, please refer to the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/config-api/kube-scheduler-config.v1/#kubescheduler-config-k8s-io-v1-VolumeBindingArgs&#34;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;further-reading&#34;&gt;Further reading&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/4049-storage-capacity-scoring-of-nodes-for-dynamic-provisioning/README.md&#34;&gt;KEP-4049: Storage Capacity Scoring of Nodes for Dynamic Provisioning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;additional-note-relationship-with-volumecapacitypriority&#34;&gt;Additional note: Relationship with VolumeCapacityPriority&lt;/h2&gt;
&lt;p&gt;The alpha feature gate &lt;code&gt;VolumeCapacityPriority&lt;/code&gt;, which performs node scoring based on available storage capacity during static provisioning,
will be deprecated and replaced by &lt;code&gt;StorageCapacityScoring&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Please note that while &lt;code&gt;VolumeCapacityPriority&lt;/code&gt; prioritizes nodes with lower available storage capacity by default,
&lt;code&gt;StorageCapacityScoring&lt;/code&gt; prioritizes nodes with higher available storage capacity by default.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Image Volumes graduate to beta!</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/29/kubernetes-v1-33-image-volume-beta/</link>
      <pubDate>Tue, 29 Apr 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/29/kubernetes-v1-33-image-volume-beta/</guid>
      <description>
        
        
        &lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/08/16/kubernetes-1-31-image-volume-source&#34;&gt;Image Volumes&lt;/a&gt; were
introduced as an Alpha feature with the Kubernetes v1.31 release as part of
&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4639&#34;&gt;KEP-4639&lt;/a&gt;. In Kubernetes v1.33, this feature graduates to &lt;strong&gt;beta&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Please note that the feature is still &lt;em&gt;disabled&lt;/em&gt; by default, because not all
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/setup/production-environment/container-runtimes/&#34;&gt;container runtimes&lt;/a&gt; have
full support for it. &lt;a href=&#34;https://cri-o.io&#34;&gt;CRI-O&lt;/a&gt; supports the initial feature since version v1.31 and
will add support for Image Volumes as beta in v1.33.
&lt;a href=&#34;https://github.com/containerd/containerd/pull/10579&#34;&gt;containerd merged&lt;/a&gt; support
for the alpha feature which will be part of the v2.1.0 release and is working on
beta support as part of &lt;a href=&#34;https://github.com/containerd/containerd/pull/11578&#34;&gt;PR #11578&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;what-s-new&#34;&gt;What&#39;s new&lt;/h3&gt;
&lt;p&gt;The major change for the beta graduation of Image Volumes is the support for
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/volumes/#using-subpath&#34;&gt;&lt;code&gt;subPath&lt;/code&gt;&lt;/a&gt; and
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/volumes/#using-subpath-expanded-environment&#34;&gt;&lt;code&gt;subPathExpr&lt;/code&gt;&lt;/a&gt; mounts
for containers via &lt;code&gt;spec.containers[*].volumeMounts.[subPath,subPathExpr]&lt;/code&gt;. This
allows end-users to mount a certain subdirectory of an image volume, which is
still mounted as readonly (&lt;code&gt;noexec&lt;/code&gt;). This means that non-existing
subdirectories cannot be mounted by default. As for other &lt;code&gt;subPath&lt;/code&gt; and
&lt;code&gt;subPathExpr&lt;/code&gt; values, Kubernetes will ensure that there are no absolute path or
relative path components part of the specified sub path. Container runtimes are
also required to double check those requirements for safety reasons. If a
specified subdirectory does not exist within a volume, then runtimes should fail
on container creation and provide user feedback by using existing kubelet
events.&lt;/p&gt;
&lt;p&gt;Besides that, there are also three new kubelet metrics available for image volumes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kubelet_image_volume_requested_total&lt;/code&gt;: Outlines the number of requested image volumes.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubelet_image_volume_mounted_succeed_total&lt;/code&gt;: Counts the number of successful image volume mounts.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubelet_image_volume_mounted_errors_total&lt;/code&gt;: Accounts the number of failed image volume mounts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To use an existing subdirectory for a specific image volume, just use it as
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/volumes/#using-subpath&#34;&gt;&lt;code&gt;subPath&lt;/code&gt;&lt;/a&gt; (or
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/volumes/#using-subpath-expanded-environment&#34;&gt;&lt;code&gt;subPathExpr&lt;/code&gt;&lt;/a&gt;)
value of the containers &lt;code&gt;volumeMounts&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;image-volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;shell&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sleep&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;infinity&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;debian&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeMounts&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mountPath&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;subPath&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;dir&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;reference&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;quay.io/crio/artifact:v2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;pullPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;IfNotPresent&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then, create the pod on your cluster:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f image-volumes-subpath.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now you can attach to the container:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl attach -it image-volume bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And check the content of the file from the &lt;code&gt;dir&lt;/code&gt; sub path in the volume:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cat /volume/file
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The output will be similar to:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-none&#34; data-lang=&#34;none&#34;&gt;1
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Thank you for reading through the end of this blog post! SIG Node is proud and
happy to deliver this feature graduation as part of Kubernetes v1.33.&lt;/p&gt;
&lt;p&gt;As writer of this blog post, I would like to emphasize my special thanks to
&lt;strong&gt;all&lt;/strong&gt; involved individuals out there!&lt;/p&gt;
&lt;p&gt;If you would like to provide feedback or suggestions feel free to reach out
to SIG Node using the &lt;a href=&#34;https://kubernetes.slack.com/messages/sig-node&#34;&gt;Kubernetes Slack (#sig-node)&lt;/a&gt;
channel or the &lt;a href=&#34;https://groups.google.com/g/kubernetes-sig-node&#34;&gt;SIG Node mailing list&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;further-reading&#34;&gt;Further reading&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/configure-pod-container/image-volumes/&#34;&gt;Use an Image Volume With a Pod&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/volumes/#image&#34;&gt;&lt;code&gt;image&lt;/code&gt; volume overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: HorizontalPodAutoscaler Configurable Tolerance</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/28/kubernetes-v1-33-hpa-configurable-tolerance/</link>
      <pubDate>Mon, 28 Apr 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/28/kubernetes-v1-33-hpa-configurable-tolerance/</guid>
      <description>
        
        
        &lt;p&gt;This post describes &lt;em&gt;configurable tolerance for horizontal Pod autoscaling&lt;/em&gt;,
a new alpha feature first available in Kubernetes 1.33.&lt;/p&gt;
&lt;h2 id=&#34;what-is-it&#34;&gt;What is it?&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/run-application/horizontal-pod-autoscale/&#34;&gt;Horizontal Pod Autoscaling&lt;/a&gt;
is a well-known Kubernetes feature that allows your workload to
automatically resize by adding or removing replicas based on resource
utilization.&lt;/p&gt;
&lt;p&gt;Let&#39;s say you have a web application running in a Kubernetes cluster with 50
replicas. You configure the HorizontalPodAutoscaler (HPA) to scale based on
CPU utilization, with a target of 75% utilization. Now, imagine that the current
CPU utilization across all replicas is 90%, which is higher than the desired
75%. The HPA will calculate the required number of replicas using the formula:&lt;/p&gt;

&lt;div class=&#34;math&#34;&gt;$$desiredReplicas = ceil\left\lceil currentReplicas \times \frac{currentMetricValue}{desiredMetricValue} \right\rceil$$&lt;/div&gt;&lt;p&gt;In this example:&lt;/p&gt;

&lt;div class=&#34;math&#34;&gt;$$50 \times (90/75) = 60$$&lt;/div&gt;&lt;p&gt;So, the HPA will increase the number of replicas from 50 to 60 to reduce the
load on each pod. Similarly, if the CPU utilization were to drop below 75%, the
HPA would scale down the number of replicas accordingly. The Kubernetes
documentation provides a
&lt;a href=&#34;https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details&#34;&gt;detailed description of the scaling algorithm&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In order to avoid replicas being created or deleted whenever a small metric
fluctuation occurs, Kubernetes applies a form of hysteresis: it only changes the
number of replicas when the current and desired metric values differ by more
than 10%. In the example above, since the ratio between the current and desired
metric values is \(90/75\), or 20% above target, exceeding the 10% tolerance,
the scale-up action will proceed.&lt;/p&gt;
&lt;p&gt;This default tolerance of 10% is cluster-wide; in older Kubernetes releases, it
could not be fine-tuned. It&#39;s a suitable value for most usage, but too coarse
for large deployments, where a 10% tolerance represents tens of pods. As a
result, the community has long
&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/116984&#34;&gt;asked&lt;/a&gt; to be able to
tune this value.&lt;/p&gt;
&lt;p&gt;In Kubernetes v1.33, this is now possible.&lt;/p&gt;
&lt;h2 id=&#34;how-do-i-use-it&#34;&gt;How do I use it?&lt;/h2&gt;
&lt;p&gt;After enabling the &lt;code&gt;HPAConfigurableTolerance&lt;/code&gt;
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/command-line-tools-reference/feature-gates/&#34;&gt;feature gate&lt;/a&gt; in
your Kubernetes v1.33 cluster, you can add your desired tolerance for your
HorizontalPodAutoscaler object.&lt;/p&gt;
&lt;p&gt;Tolerances appear under the &lt;code&gt;spec.behavior.scaleDown&lt;/code&gt; and
&lt;code&gt;spec.behavior.scaleUp&lt;/code&gt; fields and can thus be different for scale up and scale
down. A typical usage would be to specify a small tolerance on scale up (to
react quickly to spikes), but higher on scale down (to avoid adding and removing
replicas too quickly in response to small metric fluctuations).&lt;/p&gt;
&lt;p&gt;For example, an HPA with a tolerance of 5% on scale-down, and no tolerance on
scale-up, would look like the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;autoscaling/v2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HorizontalPodAutoscaler&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;my-app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;behavior&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;scaleDown&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerance&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0.05&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;scaleUp&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerance&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;i-want-all-the-details&#34;&gt;I want all the details!&lt;/h2&gt;
&lt;p&gt;Get all the technical details by reading
&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-autoscaling/4951-configurable-hpa-tolerance&#34;&gt;KEP-4951&lt;/a&gt;
and follow &lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4951&#34;&gt;issue 4951&lt;/a&gt;
to be notified of the feature graduation.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: User Namespaces enabled by default!</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/25/userns-enabled-by-default/</link>
      <pubDate>Fri, 25 Apr 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/25/userns-enabled-by-default/</guid>
      <description>
        
        
        &lt;p&gt;In Kubernetes v1.33 support for user namespaces is enabled by default. This means
that, when the stack requirements are met, pods can opt-in to use user
namespaces. To use the feature there is no need to enable any Kubernetes feature
flag anymore!&lt;/p&gt;
&lt;p&gt;In this blog post we answer some common questions about user namespaces. But,
before we dive into that, let&#39;s recap what user namespaces are and why they are
important.&lt;/p&gt;
&lt;h2 id=&#34;what-is-a-user-namespace&#34;&gt;What is a user namespace?&lt;/h2&gt;
&lt;p&gt;Note: Linux user namespaces are a different concept from &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/overview/working-with-objects/namespaces/&#34;&gt;Kubernetes
namespaces&lt;/a&gt;.
The former is a Linux kernel feature; the latter is a Kubernetes feature.&lt;/p&gt;
&lt;p&gt;Linux provides different namespaces to isolate processes from each other. For
example, a typical Kubernetes pod runs within a network namespace to isolate the
network identity and a PID namespace to isolate the processes.&lt;/p&gt;
&lt;p&gt;One Linux namespace that was left behind is the &lt;a href=&#34;https://man7.org/linux/man-pages/man7/user_namespaces.7.html&#34;&gt;user
namespace&lt;/a&gt;. It
isolates the UIDs and GIDs of the containers from the ones on the host. The
identifiers in a container can be mapped to identifiers on the host in a way
where host and container(s) never end up in overlapping UID/GIDs. Furthermore,
the identifiers can be mapped to unprivileged, non-overlapping UIDs and GIDs on
the host. This brings three key benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Prevention of lateral movement&lt;/em&gt;: As the UIDs and GIDs for different
containers are mapped to different UIDs and GIDs on the host, containers have a
harder time attacking each other, even if they escape the container boundaries.
For example, suppose container A runs with different UIDs and GIDs on the host
than container B. In that case, the operations it can do on container B&#39;s files and processes
are limited: only read/write what a file allows to others, as it will never
have permission owner or group permission (the UIDs/GIDs on the host are
guaranteed to be different for different containers).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Increased host isolation&lt;/em&gt;: As the UIDs and GIDs are mapped to unprivileged
users on the host, if a container escapes the container boundaries, even if it
runs as root inside the container, it has no privileges on the host. This
greatly protects what host files it can read/write, which process it can send
signals to, etc. Furthermore, capabilities granted are only valid inside the
user namespace and not on the host, limiting the impact a container
escape can have.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Enablement of new use cases&lt;/em&gt;: User namespaces allow containers to gain
certain capabilities inside their own user namespace without affecting the host.
This unlocks new possibilities, such as running applications that require
privileged operations without granting full root access on the host. This is
particularly useful for running nested containers.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;figure class=&#34;diagram-medium &#34;&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/images/blog/2024-04-22-userns-beta/image.svg&#34;
         alt=&#34;Image showing IDs 0-65535 are reserved to the host, pods use higher IDs&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;User namespace IDs allocation&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If a pod running as the root user without a user namespace manages to breakout,
it has root privileges on the node.  If some capabilities were granted to the
container, the capabilities are valid on the host too. None of this is true when
using user namespaces (modulo bugs, of course 🙂).&lt;/p&gt;
&lt;h2 id=&#34;demos&#34;&gt;Demos&lt;/h2&gt;
&lt;p&gt;Rodrigo created demos to understand how some CVEs are mitigated when user
namespaces are used. We showed them here before (see &lt;a href=&#34;https://kubernetes.io/blog/2023/09/13/userns-alpha/&#34;&gt;here&lt;/a&gt; and
&lt;a href=&#34;https://kubernetes.io/blog/2024/04/22/userns-beta/&#34;&gt;here&lt;/a&gt;), but take a look if you haven&#39;t:&lt;/p&gt;
&lt;p&gt;Mitigation of CVE 2024-21626 with user namespaces:&lt;/p&gt;


    
    &lt;div class=&#34;youtube-quote-sm&#34;&gt;
      &lt;iframe allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34; allowfullscreen=&#34;allowfullscreen&#34; loading=&#34;eager&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; src=&#34;https://www.youtube.com/embed/07y5bl5UDdA?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0&#34; title=&#34;Mitigation of CVE-2024-21626 on Kubernetes by enabling User Namespace support&#34;
      &gt;&lt;/iframe&gt;
    &lt;/div&gt;

&lt;p&gt;Mitigation of CVE 2022-0492 with user namespaces:&lt;/p&gt;


    
    &lt;div class=&#34;youtube-quote-sm&#34;&gt;
      &lt;iframe allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34; allowfullscreen=&#34;allowfullscreen&#34; loading=&#34;eager&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; src=&#34;https://www.youtube.com/embed/M4a2b4KkXN8?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0&#34; title=&#34;Mitigation of CVE-2022-0492 on Kubernetes by enabling User Namespace support&#34;
      &gt;&lt;/iframe&gt;
    &lt;/div&gt;

&lt;h2 id=&#34;everything-you-wanted-to-know-about-user-namespaces-in-kubernetes&#34;&gt;Everything you wanted to know about user namespaces in Kubernetes&lt;/h2&gt;
&lt;p&gt;Here we try to answer some of the questions we have been asked about user
namespaces support in Kubernetes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. What are the requirements to use it?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The requirements are documented &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/user-namespaces/#before-you-begin&#34;&gt;here&lt;/a&gt;. But we will elaborate a bit
more, in the following questions.&lt;/p&gt;
&lt;p&gt;Note this is a Linux-only feature.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. How do I configure a pod to opt-in?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A complete step-by-step guide is available &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/configure-pod-container/user-namespaces/&#34;&gt;here&lt;/a&gt;. But the short
version is you need to set the &lt;code&gt;hostUsers: false&lt;/code&gt; field in the pod spec. For
example like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;userns&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostUsers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;false&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;shell&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sleep&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;infinity&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;debian&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Yes, it is that simple. Applications will run just fine, without any other
changes needed (unless your application needs the privileges).&lt;/p&gt;
&lt;p&gt;User namespaces allows you to run as root inside the container, but not have
privileges in the host. However, if your application needs the privileges on the
host, for example an app that needs to load a kernel module, then you can&#39;t use
user namespaces.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. What are idmap mounts and why the file-systems used need to support it?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Idmap mounts are a Linux kernel feature that uses a mapping of UIDs/GIDs when
accessing a mount. When combined with user namespaces, it greatly simplifies the
support for volumes, as you can forget about the host UIDs/GIDs the user
namespace is using.&lt;/p&gt;
&lt;p&gt;In particular, thanks to idmap mounts we can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Run each pod with different UIDs/GIDs on the host. This is key for the
lateral movement prevention we mentioned earlier.&lt;/li&gt;
&lt;li&gt;Share volumes with pods that don&#39;t use user namespaces.&lt;/li&gt;
&lt;li&gt;Enable/disable user namespaces without needing to chown the pod&#39;s volumes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Support for idmap mounts in the kernel is per file-system and different kernel
releases added support for idmap mounts on different file-systems.&lt;/p&gt;
&lt;p&gt;To find which kernel version added support for each file-system, you can check
out the &lt;code&gt;mount_setattr&lt;/code&gt; man page, or the online version of it
&lt;a href=&#34;https://man7.org/linux/man-pages/man2/mount_setattr.2.html#NOTES&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Most popular file-systems are supported, the notable absence that isn&#39;t
supported yet is NFS.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Can you clarify exactly which file-systems need to support idmap mounts?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The file-systems that need to support idmap mounts are all the file-systems used
by a pod in the &lt;code&gt;pod.spec.volumes&lt;/code&gt; field.&lt;/p&gt;
&lt;p&gt;This means: for PV/PVC volumes, the file-system used in the PV needs to support
idmap mounts; for hostPath volumes, the file-system used in the hostPath
needs to support idmap mounts.&lt;/p&gt;
&lt;p&gt;What does this mean for secrets/configmaps/projected/downwardAPI volumes? For
these volumes, the kubelet creates a &lt;code&gt;tmpfs&lt;/code&gt; file-system. So, you will need a
6.3 kernel to use these volumes (note that if you use them as env variables it
is fine).&lt;/p&gt;
&lt;p&gt;And what about emptyDir volumes? Those volumes are created by the kubelet by
default in &lt;code&gt;/var/lib/kubelet/pods/&lt;/code&gt;. You can also use a custom directory for
this. But what needs to support idmap mounts is the file-system used in that
directory.&lt;/p&gt;
&lt;p&gt;The kubelet creates some more files for the container, like &lt;code&gt;/etc/hostname&lt;/code&gt;,
&lt;code&gt;/etc/resolv.conf&lt;/code&gt;, &lt;code&gt;/dev/termination-log&lt;/code&gt;, &lt;code&gt;/etc/hosts&lt;/code&gt;, etc. These files are
also created in &lt;code&gt;/var/lib/kubelet/pods/&lt;/code&gt; by default, so it&#39;s important for the
file-system used in that directory to support idmap mounts.&lt;/p&gt;
&lt;p&gt;Also, some container runtimes may put some of these ephemeral volumes inside a
&lt;code&gt;tmpfs&lt;/code&gt; file-system, in which case you will need support for idmap mounts in
&lt;code&gt;tmpfs&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Can I use a kernel older than 6.3?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yes, but you will need to make sure you are not using a &lt;code&gt;tmpfs&lt;/code&gt; file-system. If
you avoid that, you can easily use 5.19 (if all the other file-systems you use
support idmap mounts in that kernel).&lt;/p&gt;
&lt;p&gt;It can be tricky to avoid using &lt;code&gt;tmpfs&lt;/code&gt;, though, as we just described above.
Besides having to avoid those volume types, you will also have to avoid mounting the
service account token. Every pod has it mounted by default, and it uses a
projected volume that, as we mentioned, uses a &lt;code&gt;tmpfs&lt;/code&gt; file-system.&lt;/p&gt;
&lt;p&gt;You could even go lower than 5.19, all the way to 5.12. However, your container
rootfs probably uses an overlayfs file-system, and support for overlayfs was
added in 5.19. We wouldn&#39;t recommend to use a kernel older than 5.19, as not
being able to use idmap mounts for the rootfs is a big limitation. If you
absolutely need to, you can check &lt;a href=&#34;https://kinvolk.io/blog/2023/11/tips-and-tricks-for-user-namespaces-with-kubernetes-and-containerd&#34;&gt;this blog post&lt;/a&gt; Rodrigo wrote
some years ago, about tricks to use user namespaces when you can&#39;t support
idmap mounts on the rootfs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. If my stack supports user namespaces, do I need to configure anything else?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;No, if your stack supports it and you are using Kubernetes v1.33, there is
nothing you &lt;em&gt;need&lt;/em&gt; to configure. You should be able to follow the task: &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/configure-pod-container/user-namespaces/&#34;&gt;Use a
user namespace with a pod&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, in case you have specific requirements, you may configure various
options. You can find more information &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/user-namespaces/#set-up-a-node-to-support-user-namespaces&#34;&gt;here&lt;/a&gt;. You can also
enable a &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/user-namespaces/#integration-with-pod-security-admission-checks&#34;&gt;feature gate to relax the PSS rules&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;7. The demos are nice, but are there more CVEs that this mitigates?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yes, quite a lot, actually! Besides the ones in the demo, the KEP has &lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/b8013bfbceb16843686aebbb2ccffce81a6e772d/keps/sig-node/127-user-namespaces/README.md#motivation&#34;&gt;more CVEs
you can check&lt;/a&gt;. That list is not exhaustive, there are many more.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;8. Can you sum up why user namespaces is important?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Think about running a process as root, maybe even an untrusted process. Do you
think that is secure? What if we limit it by adding seccomp and apparmor, mask
some files in /proc (so it can&#39;t crash the node, etc.) and some more tweaks?&lt;/p&gt;
&lt;p&gt;Wouldn&#39;t it be better if we don&#39;t give it privileges in the first place, instead
of trying to play whack-a-mole with all the possible ways root can escape?&lt;/p&gt;
&lt;p&gt;This is what user namespaces does, plus some other goodies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Run as an unprivileged user on the host without making changes to your application&lt;/strong&gt;.
Greg and Vinayak did a great talk on the pains you can face when trying to run
unprivileged without user namespaces. The pains part &lt;a href=&#34;https://youtu.be/uouH9fsWVIE?feature=shared&amp;t=351&#34;&gt;starts in this minute&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;All pods run with different UIDs/GIDs, we significantly improve the lateral
movement&lt;/strong&gt;. This is guaranteed with user namespaces (the kubelet chooses it for
you). In the same talk, Greg and Vinayak show that to achieve the same without
user namespaces, they went through a quite complex custom solution. This part
&lt;a href=&#34;https://youtu.be/uouH9fsWVIE?feature=shared&amp;t=793&#34;&gt;starts in this minute&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The capabilities granted are only granted inside the user namespace&lt;/strong&gt;. That
means that if a pod breaks out of the container, they are not valid on the
host. We can&#39;t provide that without user namespaces.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;It enables new use-cases in a &lt;em&gt;secure&lt;/em&gt; way&lt;/strong&gt;. You can run docker in docker,
unprivileged container builds, Kubernetes inside Kubernetes, etc all &lt;strong&gt;in a secure
way&lt;/strong&gt;. Most of the previous solutions to do this required privileged containers or
putting the node at a high risk of compromise.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;9. Is there container runtime documentation for user namespaces?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yes, we have &lt;a href=&#34;https://github.com/containerd/containerd/tree/b22a302a75d9a7d7955780e54cc5b32de6c8525d/docs/user-namespaces&#34;&gt;containerd
documentation&lt;/a&gt;.
This explains different limitations of containerd 1.7 and how to use
user namespaces in containerd without Kubernetes pods (using &lt;code&gt;ctr&lt;/code&gt;). Note that
if you use containerd, you need containerd 2.0 or higher to use user namespaces
with Kubernetes.&lt;/p&gt;
&lt;p&gt;CRI-O doesn&#39;t have special documentation for user namespaces, it works out of
the box.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;10. What about the other container runtimes?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;No other container runtime that we are aware of supports user namespaces with
Kubernetes. That sadly includes &lt;a href=&#34;https://github.com/Mirantis/cri-dockerd/issues/74&#34;&gt;cri-dockerd&lt;/a&gt; too.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;11. I&#39;d like to learn more about it, what would you recommend?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Rodrigo did an introduction to user namespaces at KubeCon 2022:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://sched.co/182K0&#34;&gt;Run As “Root”, Not Root: User Namespaces In K8s- Marga Manterola, Isovalent &amp;amp; Rodrigo Campos Catelin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Also, this aforementioned presentation at KubeCon 2023 can be
useful as a motivation for user namespaces:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://sched.co/1HyX4&#34;&gt;Least Privilege Containers: Keeping a Bad Day from Getting Worse - Greg Castle &amp;amp; Vinayak Goyal&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Bear in mind the presentation are some years old, some things have changed since
then. Use the Kubernetes documentation as the source of truth.&lt;/p&gt;
&lt;p&gt;If you would like to learn more about the low-level details of user namespaces,
you can check &lt;code&gt;man 7 user_namespaces&lt;/code&gt; and &lt;code&gt;man 1 unshare&lt;/code&gt;. You can easily create
namespaces and experiment with how they behave. Be aware that the &lt;code&gt;unshare&lt;/code&gt; tool
has a lot of flexibility, and with that options to create incomplete setups.&lt;/p&gt;
&lt;p&gt;If you would like to know more about idmap mounts, you can check &lt;a href=&#34;https://docs.kernel.org/filesystems/idmappings.html&#34;&gt;its Linux
kernel documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;conclusions&#34;&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Running pods as root is not ideal and running them as non-root is also hard
with containers, as it can require a lot of changes to the applications.
User namespaces are a unique feature to let you have the best of both worlds: run
as non-root, without any changes to your application.&lt;/p&gt;
&lt;p&gt;This post covered: what are user namespaces, why they are important, some real
world examples of CVEs mitigated by user-namespaces, and some common questions.
Hopefully, this post helped you to eliminate the last doubts you had and you
will now try user-namespaces (if you didn&#39;t already!).&lt;/p&gt;
&lt;h2 id=&#34;how-do-i-get-involved&#34;&gt;How do I get involved?&lt;/h2&gt;
&lt;p&gt;You can reach SIG Node by several means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Slack: &lt;a href=&#34;https://kubernetes.slack.com/messages/sig-node&#34;&gt;#sig-node&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://groups.google.com/forum/#!forum/kubernetes-sig-node&#34;&gt;Mailing list&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/community/labels/sig%2Fnode&#34;&gt;Open Community Issues/PRs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can also contact us directly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub: @rata @giuseppe @saschagrunert&lt;/li&gt;
&lt;li&gt;Slack: @rata @giuseppe @sascha&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Continuing the transition from Endpoints to EndpointSlices</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/24/endpoints-deprecation/</link>
      <pubDate>Thu, 24 Apr 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/24/endpoints-deprecation/</guid>
      <description>
        
        
        &lt;p&gt;Since the addition of &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2020/09/02/scaling-kubernetes-networking-with-endpointslices/&#34;&gt;EndpointSlices&lt;/a&gt; (&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0752-endpointslices/README.md&#34;&gt;KEP-752&lt;/a&gt;) as alpha in v1.15
and later GA in v1.21, the
Endpoints API in Kubernetes has been gathering dust. New Service
features like &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/services-networking/dual-stack/&#34;&gt;dual-stack networking&lt;/a&gt; and &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/networking/virtual-ips/#traffic-distribution&#34;&gt;traffic distribution&lt;/a&gt; are
only supported via the EndpointSlice API, so all service proxies,
Gateway API implementations, and similar controllers have had to be
ported from using Endpoints to using EndpointSlices. At this point,
the Endpoints API is really only there to avoid breaking end user
workloads and scripts that still make use of it.&lt;/p&gt;
&lt;p&gt;As of Kubernetes 1.33, the Endpoints API is now officially deprecated,
and the API server will return warnings to users who read or write
Endpoints resources rather than using EndpointSlices.&lt;/p&gt;
&lt;p&gt;Eventually, the plan (as documented in &lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/4974-deprecate-endpoints/README.md&#34;&gt;KEP-4974&lt;/a&gt;) is to change the
&lt;a href=&#34;https://www.cncf.io/training/certification/software-conformance/&#34;&gt;Kubernetes Conformance&lt;/a&gt; criteria to no longer require that clusters
run the &lt;em&gt;Endpoints controller&lt;/em&gt; (which generates Endpoints objects
based on Services and Pods), to avoid doing work that is unneeded in
most modern-day clusters.&lt;/p&gt;
&lt;p&gt;Thus, while the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/using-api/deprecation-policy/&#34;&gt;Kubernetes deprecation policy&lt;/a&gt; means that the
Endpoints type itself will probably never completely go away, users
who still have workloads or scripts that use the Endpoints API should
start migrating them to EndpointSlices.&lt;/p&gt;
&lt;h2 id=&#34;notes-on-migrating-from-endpoints-to-endpointslices&#34;&gt;Notes on migrating from Endpoints to EndpointSlices&lt;/h2&gt;
&lt;h3 id=&#34;consuming-endpointslices-rather-than-endpoints&#34;&gt;Consuming EndpointSlices rather than Endpoints&lt;/h3&gt;
&lt;p&gt;For end users, the biggest change between the Endpoints API and the
EndpointSlice API is that while every Service with a &lt;code&gt;selector&lt;/code&gt; has
exactly 1 Endpoints object (with the same name as the Service), a
Service may have any number of EndpointSlices associated with it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;$&lt;/span&gt; kubectl get endpoints myservice
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;NAME        ENDPOINTS          AGE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;myservice   10.180.3.17:443    1h
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt;&lt;span style=&#34;&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;$&lt;/span&gt; kubectl get endpointslice -l kubernetes.io/service-name&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;myservice
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;NAME              ADDRESSTYPE   PORTS   ENDPOINTS          AGE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;myservice-7vzhx   IPv4          443     10.180.3.17        21s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;myservice-jcv8s   IPv6          443     2001:db8:0123::5   21s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this case, because the service is dual stack, it has 2
EndpointSlices: 1 for IPv4 addresses and 1 for IPv6 addresses. (The
Endpoints API does not support dual stack, so the Endpoints object
shows only the addresses in the cluster&#39;s primary address family.)
Although any Service with multiple endpoints &lt;em&gt;can&lt;/em&gt; have multiple
EndpointSlices, there are three main cases where you will see this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;An EndpointSlice can only represent endpoints of a single IP
family, so dual-stack Services will have separate EndpointSlices
for IPv4 and IPv6.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;All of the endpoints in an EndpointSlice must target the same
ports. So, for example, if you have a set of endpoint Pods
listening on port 80, and roll out an update to make them listen
on port 8080 instead, then while the rollout is in progress, the
Service will need 2 EndpointSlices: 1 for the endpoints listening
on port 80, and 1 for the endpoints listening on port 8080.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When a Service has more than 100 endpoints, the EndpointSlice
controller will split the endpoints into multiple EndpointSlices
rather than aggregating them into a single excessively-large
object like the Endpoints controller does.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because there is not a predictable 1-to-1 mapping between Services and
EndpointSlices, there is no way to know what the actual name of the
EndpointSlice resource(s) for a Service will be ahead of time; thus,
instead of fetching the EndpointSlice(s) by name, you instead ask for
all EndpointSlices with a &amp;quot;&lt;code&gt;kubernetes.io/service-name&lt;/code&gt;&amp;quot;
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/overview/working-with-objects/labels/&#34;&gt;label&lt;/a&gt; pointing
to the Service:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;$&lt;/span&gt; kubectl get endpointslice -l kubernetes.io/service-name&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;myservice
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A similar change is needed in Go code. With Endpoints, you would do
something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Get the Endpoints named `name` in `namespace`.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;endpoint, err &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; client.&lt;span style=&#34;color:#00a000&#34;&gt;CoreV1&lt;/span&gt;().&lt;span style=&#34;color:#00a000&#34;&gt;Endpoints&lt;/span&gt;(namespace).&lt;span style=&#34;color:#00a000&#34;&gt;Get&lt;/span&gt;(ctx, name, metav1.GetOptions{})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; err &lt;span style=&#34;color:#666&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; apierrors.&lt;span style=&#34;color:#00a000&#34;&gt;IsNotFound&lt;/span&gt;(err) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// No Endpoints exists for the Service (yet?)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;		&lt;span style=&#34;color:#666&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// handle other errors
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;	&lt;span style=&#34;color:#666&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// process `endpoint`
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With EndpointSlices, this becomes:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Get all EndpointSlices for Service `name` in `namespace`.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;slices, err &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; client.&lt;span style=&#34;color:#00a000&#34;&gt;DiscoveryV1&lt;/span&gt;().&lt;span style=&#34;color:#00a000&#34;&gt;EndpointSlices&lt;/span&gt;(namespace).&lt;span style=&#34;color:#00a000&#34;&gt;List&lt;/span&gt;(ctx,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	metav1.ListOptions{LabelSelector: discoveryv1.LabelServiceName &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;=&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;+&lt;/span&gt; name})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; err &lt;span style=&#34;color:#666&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// handle errors
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;	&lt;span style=&#34;color:#666&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;} &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;else&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(slices.Items) &lt;span style=&#34;color:#666&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// No EndpointSlices exist for the Service (yet?)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;	&lt;span style=&#34;color:#666&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// process `slices.Items`
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;generating-endpointslices-rather-than-endpoints&#34;&gt;Generating EndpointSlices rather than Endpoints&lt;/h3&gt;
&lt;p&gt;For people (or controllers) generating Endpoints, migrating to
EndpointSlices is slightly easier, because in most cases you won&#39;t
have to worry about multiple slices. You just need to update your YAML
or Go code to use the new type (which organizes the information in a
slightly different way than Endpoints did).&lt;/p&gt;
&lt;p&gt;For example, this Endpoints object:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Endpoints&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myservice&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;subsets&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;addresses&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ip&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10.180.3.17&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-4&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ip&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10.180.5.22&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-9&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ip&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10.180.18.2&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-7&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;notReadyAddresses&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ip&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10.180.6.6&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-8&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;https&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;TCP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;443&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;would become something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;discovery.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;EndpointSlice&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myservice&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kubernetes.io/service-name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myservice&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;addressType&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;IPv4&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;endpoints&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;addresses&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#666&#34;&gt;10.180.3.17&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-4&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;addresses&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#666&#34;&gt;10.180.5.22&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-9&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;addresses&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#666&#34;&gt;10.180.18.12&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-7&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;addresses&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#666&#34;&gt;10.180.6.6&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-8&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;conditions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ready&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;false&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;https&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;TCP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;443&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Some points to note:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;This example uses an explicit &lt;code&gt;name&lt;/code&gt;, but you could also use
&lt;code&gt;generateName&lt;/code&gt; and let the API server append a unique suffix. The name
itself does not matter: what matters is the
&lt;code&gt;&amp;quot;kubernetes.io/service-name&amp;quot;&lt;/code&gt; label pointing back to the Service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You have to explicitly indicate &lt;code&gt;addressType: IPv4&lt;/code&gt; (or &lt;code&gt;IPv6&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An EndpointSlice is similar to a single element of the &lt;code&gt;&amp;quot;subsets&amp;quot;&lt;/code&gt;
array in Endpoints. An Endpoints object with multiple subsets will
normally need to be expressed as multiple EndpointSlices, each with
different &lt;code&gt;&amp;quot;ports&amp;quot;&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;code&gt;endpoints&lt;/code&gt; and &lt;code&gt;addresses&lt;/code&gt; fields are both arrays, but by
convention, each &lt;code&gt;addresses&lt;/code&gt; array only contains a single element. If
your Service has multiple endpoints, then you need to have multiple
elements in the &lt;code&gt;endpoints&lt;/code&gt; array, each with a single element in its
&lt;code&gt;addresses&lt;/code&gt; array.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Endpoints API lists &amp;quot;ready&amp;quot; and &amp;quot;not-ready&amp;quot; endpoints
separately, while the EndpointSlice API allows each endpoint to have
conditions (such as &amp;quot;&lt;code&gt;ready: false&lt;/code&gt;&amp;quot;) associated with it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And of course, once you have ported to EndpointSlice, you can make use
of EndpointSlice-specific features, such as topology hints and
terminating endpoints. Consult the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/kubernetes-api/service-resources/endpoint-slice-v1/&#34;&gt;EndpointSlice API documentation&lt;/a&gt;
for more information.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33: Octarine</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/23/kubernetes-v1-33-release/</link>
      <pubDate>Wed, 23 Apr 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/23/kubernetes-v1-33-release/</guid>
      <description>
        
        
        &lt;p&gt;&lt;strong&gt;Editors:&lt;/strong&gt; Agustina Barbetta, Aakanksha Bhende, Udi Hofesh, Ryota Sawada, Sneha Yadav&lt;/p&gt;
&lt;p&gt;Similar to previous releases, the release of Kubernetes v1.33 introduces new stable, beta, and alpha
features. The consistent delivery of high-quality releases underscores the strength of our
development cycle and the vibrant support from our community.&lt;/p&gt;
&lt;p&gt;This release consists of 64 enhancements. Of those enhancements, 18 have graduated to Stable, 20 are
entering Beta, 24 have entered Alpha, and 2 are deprecated or withdrawn.&lt;/p&gt;
&lt;p&gt;There are also several notable &lt;a href=&#34;#deprecations-and-removals&#34;&gt;deprecations and removals&lt;/a&gt; in this
release; make sure to read about those if you already run an older version of Kubernetes.&lt;/p&gt;
&lt;h2 id=&#34;release-theme-and-logo&#34;&gt;Release theme and logo&lt;/h2&gt;


&lt;figure class=&#34;release-logo &#34;&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/23/kubernetes-v1-33-release/k8s-1.33.svg&#34;
         alt=&#34;Kubernetes v1.33 logo: Octarine&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;The theme for Kubernetes v1.33 is &lt;strong&gt;Octarine: The Color of Magic&lt;/strong&gt;&lt;sup&gt;1&lt;/sup&gt;, inspired by Terry
Pratchett’s &lt;em&gt;Discworld&lt;/em&gt; series. This release highlights the open source magic&lt;sup&gt;2&lt;/sup&gt; that
Kubernetes enables across the ecosystem.&lt;/p&gt;
&lt;p&gt;If you’re familiar with the world of Discworld, you might recognize a small swamp dragon perched
atop the tower of the Unseen University, gazing up at the Kubernetes moon above the city of
Ankh-Morpork with 64 stars&lt;sup&gt;3&lt;/sup&gt; in the background.&lt;/p&gt;
&lt;p&gt;As Kubernetes moves into its second decade, we celebrate both the wizardry of its maintainers, the
curiosity of new contributors, and the collaborative spirit that fuels the project. The v1.33
release is a reminder that, as Pratchett wrote, &lt;em&gt;“It’s still magic even if you know how it’s done.”&lt;/em&gt;
Even if you know the ins and outs of the Kubernetes code base, stepping back at the end of the
release cycle, you’ll realize that Kubernetes remains magical.&lt;/p&gt;
&lt;p&gt;Kubernetes v1.33 is a testament to the enduring power of open source innovation, where hundreds of
contributors&lt;sup&gt;4&lt;/sup&gt; from around the world work together to create something truly
extraordinary. Behind every new feature, the Kubernetes community works to maintain and improve the
project, ensuring it remains secure, reliable, and released on time. Each release builds upon the
other, creating something greater than we could achieve alone.&lt;/p&gt;
&lt;p&gt;&lt;sub&gt;1. Octarine is the mythical eighth color, visible only to those attuned to the arcane—wizards,
witches, and, of course, cats. And occasionally, someone who’s stared at IPtable rules for too
long.&lt;/sub&gt;&lt;br&gt;
&lt;sub&gt;2. Any sufficiently advanced technology is indistinguishable from magic…?&lt;/sub&gt;&lt;br&gt;
&lt;sub&gt;3. It’s not a coincidence 64 KEPs (Kubernetes Enhancement Proposals) are also included in
v1.33.&lt;/sub&gt;&lt;br&gt;
&lt;sub&gt;4. See the Project Velocity section for v1.33 🚀&lt;/sub&gt;&lt;/p&gt;
&lt;h2 id=&#34;spotlight-on-key-updates&#34;&gt;Spotlight on key updates&lt;/h2&gt;
&lt;p&gt;Kubernetes v1.33 is packed with new features and improvements. Here are a few select updates the
Release Team would like to highlight!&lt;/p&gt;
&lt;h3 id=&#34;stable-sidecar-containers&#34;&gt;Stable: Sidecar containers&lt;/h3&gt;
&lt;p&gt;The sidecar pattern involves deploying separate auxiliary container(s) to handle extra capabilities
in areas such as networking, logging, and metrics gathering. Sidecar containers graduate to stable
in v1.33.&lt;/p&gt;
&lt;p&gt;Kubernetes implements sidecars as a special class of init containers with &lt;code&gt;restartPolicy: Always&lt;/code&gt;,
ensuring that sidecars start before application containers, remain running throughout the pod&#39;s
lifecycle, and terminate automatically after the main containers exit.&lt;/p&gt;
&lt;p&gt;Additionally, sidecars can utilize probes (startup, readiness, liveness) to signal their operational
state, and their Out-Of-Memory (OOM) score adjustments are aligned with primary containers to
prevent premature termination under memory pressure.&lt;/p&gt;
&lt;p&gt;To learn more, read &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/sidecar-containers/&#34;&gt;Sidecar Containers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/753&#34;&gt;KEP-753: Sidecar Containers&lt;/a&gt; led by SIG Node.&lt;/p&gt;
&lt;h3 id=&#34;beta-in-place-resource-resize-for-vertical-scaling-of-pods&#34;&gt;Beta: In-place resource resize for vertical scaling of Pods&lt;/h3&gt;
&lt;p&gt;Workloads can be defined using APIs like Deployment, StatefulSet, etc. These describe the template
for the Pods that should run, including memory and CPU resources, as well as the replica count of
the number of Pods that should run. Workloads can be scaled horizontally by updating the Pod replica
count, or vertically by updating the resources required in the Pods container(s). Before this
enhancement, container resources defined in a Pod&#39;s &lt;code&gt;spec&lt;/code&gt; were immutable, and updating any of these
details within a Pod template would trigger Pod replacement.&lt;/p&gt;
&lt;p&gt;But what if you could dynamically update the resource configuration for your existing Pods without
restarting them?&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://kep.k8s.io/1287&#34;&gt;KEP-1287&lt;/a&gt; is precisely to allow such in-place Pod updates. It was
released as alpha in v1.27, and has graduated to beta in v1.33. This opens up various possibilities
for vertical scale-up of stateful processes without any downtime, seamless scale-down when the
traffic is low, and even allocating larger resources during startup, which can then be reduced once
the initial setup is complete.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/1287&#34;&gt;KEP-1287: In-Place Update of Pod Resources&lt;/a&gt;
led by SIG Node and SIG Autoscaling.&lt;/p&gt;
&lt;h3 id=&#34;alpha-new-configuration-option-for-kubectl-with-kuberc-for-user-preferences&#34;&gt;Alpha: New configuration option for kubectl with &lt;code&gt;.kuberc&lt;/code&gt; for user preferences&lt;/h3&gt;
&lt;p&gt;In v1.33, &lt;code&gt;kubectl&lt;/code&gt; introduces a new alpha feature with opt-in configuration file &lt;code&gt;.kuberc&lt;/code&gt; for user
preferences. This file can contain &lt;code&gt;kubectl&lt;/code&gt; aliases and overrides (e.g. defaulting to use
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/using-api/server-side-apply/&#34;&gt;server-side apply&lt;/a&gt;), while leaving cluster
credentials and host information in kubeconfig. This separation allows sharing the same user
preferences for &lt;code&gt;kubectl&lt;/code&gt; interaction, regardless of target cluster and kubeconfig used.&lt;/p&gt;
&lt;p&gt;To enable this alpha feature, users can set the environment variable of &lt;code&gt;KUBECTL_KUBERC=true&lt;/code&gt; and
create a &lt;code&gt;.kuberc&lt;/code&gt; configuration file. By default, &lt;code&gt;kubectl&lt;/code&gt; looks for this file in
&lt;code&gt;~/.kube/kuberc&lt;/code&gt;. You can also specify an alternative location using the &lt;code&gt;--kuberc&lt;/code&gt; flag, for
example: &lt;code&gt;kubectl --kuberc /var/kube/rc&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/3104&#34;&gt;KEP-3104: Separate kubectl user preferences from cluster configs&lt;/a&gt; led by
SIG CLI.&lt;/p&gt;
&lt;h2 id=&#34;features-graduating-to-stable&#34;&gt;Features graduating to Stable&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;This is a selection of some of the improvements that are now stable following the v1.33 release.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&#34;backoff-limits-per-index-for-indexed-jobs&#34;&gt;Backoff limits per index for indexed Jobs&lt;/h3&gt;
&lt;p&gt;​This release graduates a feature that allows setting backoff limits on a per-index basis for Indexed
Jobs. Traditionally, the &lt;code&gt;backoffLimit&lt;/code&gt; parameter in Kubernetes Jobs specifies the number of retries
before considering the entire Job as failed. This enhancement allows each index within an Indexed
Job to have its own backoff limit, providing more granular control over retry behavior for
individual tasks. This ensures that the failure of specific indices does not prematurely terminate
the entire Job, allowing the other indices to continue processing independently.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/3850&#34;&gt;KEP-3850: Backoff Limit Per Index For Indexed Jobs&lt;/a&gt; led by SIG Apps.&lt;/p&gt;
&lt;h3 id=&#34;job-success-policy&#34;&gt;Job success policy&lt;/h3&gt;
&lt;p&gt;Using &lt;code&gt;.spec.successPolicy&lt;/code&gt;, users can specify which pod indexes must succeed (&lt;code&gt;succeededIndexes&lt;/code&gt;),
how many pods must succeed (&lt;code&gt;succeededCount&lt;/code&gt;), or a combination of both. This feature benefits
various workloads, including simulations where partial completion is sufficient, and leader-worker
patterns where only the leader&#39;s success determines the Job&#39;s overall outcome.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/3998&#34;&gt;KEP-3998: Job success/completion policy&lt;/a&gt; led
by SIG Apps.&lt;/p&gt;
&lt;h3 id=&#34;bound-serviceaccount-token-security-improvements&#34;&gt;Bound ServiceAccount token security improvements&lt;/h3&gt;
&lt;p&gt;This enhancement introduced features such as including a unique token identifier (i.e.
&lt;a href=&#34;https://datatracker.ietf.org/doc/html/rfc7519#section-4.1.7&#34;&gt;JWT ID Claim, also known as JTI&lt;/a&gt;) and
node information within the tokens, enabling more precise validation and auditing. Additionally, it
supports node-specific restrictions, ensuring that tokens are only usable on designated nodes,
thereby reducing the risk of token misuse and potential security breaches. These improvements, now
generally available, aim to enhance the overall security posture of service account tokens within
Kubernetes clusters.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/4193&#34;&gt;KEP-4193: Bound service account token improvements&lt;/a&gt; led by SIG Auth.&lt;/p&gt;
&lt;h3 id=&#34;subresource-support-in-kubectl&#34;&gt;Subresource support in kubectl&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;--subresource&lt;/code&gt; argument is now generally available for kubectl subcommands such as &lt;code&gt;get&lt;/code&gt;,
&lt;code&gt;patch&lt;/code&gt;, &lt;code&gt;edit&lt;/code&gt;, &lt;code&gt;apply&lt;/code&gt; and &lt;code&gt;replace&lt;/code&gt;, allowing users to fetch and update subresources for all
resources that support them. To learn more about the subresources supported, visit the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/kubectl/conventions/#subresources&#34;&gt;kubectl reference&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/2590&#34;&gt;KEP-2590: Add subresource support to kubectl&lt;/a&gt; led by SIG CLI.&lt;/p&gt;
&lt;h3 id=&#34;multiple-service-cidrs&#34;&gt;Multiple Service CIDRs&lt;/h3&gt;
&lt;p&gt;This enhancement introduced a new implementation of allocation logic for Service IPs. Across the
whole cluster, every Service of &lt;code&gt;type: ClusterIP&lt;/code&gt; must have a unique IP address assigned to it.
Trying to create a Service with a specific cluster IP that has already been allocated will return an
error. The updated IP address allocator logic uses two newly stable API objects: &lt;code&gt;ServiceCIDR&lt;/code&gt; and
&lt;code&gt;IPAddress&lt;/code&gt;. Now generally available, these APIs allow cluster administrators to dynamically
increase the number of IP addresses available for &lt;code&gt;type: ClusterIP&lt;/code&gt; Services (by creating new
ServiceCIDR objects).&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/1880&#34;&gt;KEP-1880: Multiple Service CIDRs&lt;/a&gt; led by SIG
Network.&lt;/p&gt;
&lt;h3 id=&#34;nftables-backend-for-kube-proxy&#34;&gt;&lt;code&gt;nftables&lt;/code&gt; backend for kube-proxy&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;nftables&lt;/code&gt; backend for kube-proxy is now stable, adding a new implementation that significantly
improves performance and scalability for Services implementation within Kubernetes clusters. For
compatibility reasons, &lt;code&gt;iptables&lt;/code&gt; remains the default on Linux nodes. Check the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/networking/virtual-ips/#migrating-from-iptables-mode-to-nftables&#34;&gt;migration guide&lt;/a&gt;
if you want to try it out.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/3866&#34;&gt;KEP-3866: nftables kube-proxy backend&lt;/a&gt; led
by SIG Network.&lt;/p&gt;
&lt;h3 id=&#34;topology-aware-routing-with-trafficdistribution-preferclose&#34;&gt;Topology aware routing with &lt;code&gt;trafficDistribution: PreferClose&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This release graduates topology-aware routing and traffic distribution to GA, which would allow us
to optimize service traffic in multi-zone clusters. The topology-aware hints in EndpointSlices would
enable components like kube-proxy to prioritize routing traffic to endpoints within the same zone,
thereby reducing latency and cross-zone data transfer costs. Building upon this,
&lt;code&gt;trafficDistribution&lt;/code&gt; field is added to the Service specification, with the &lt;code&gt;PreferClose&lt;/code&gt; option
directing traffic to the nearest available endpoints based on network topology. This configuration
enhances performance and cost-efficiency by minimizing inter-zone communication.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/4444&#34;&gt;KEP-4444: Traffic Distribution for Services&lt;/a&gt;
and &lt;a href=&#34;https://kep.k8s.io/2433&#34;&gt;KEP-2433: Topology Aware Routing&lt;/a&gt; led by SIG Network.&lt;/p&gt;
&lt;h3 id=&#34;options-to-reject-non-smt-aligned-workload&#34;&gt;Options to reject non SMT-aligned workload&lt;/h3&gt;
&lt;p&gt;This feature added policy options to the CPU Manager, enabling it to reject workloads that do not
align with Simultaneous Multithreading (SMT) configurations. This enhancement, now generally
available, ensures that when a pod requests exclusive use of CPU cores, the CPU Manager can enforce
allocation of entire core pairs (comprising primary and sibling threads) on SMT-enabled systems,
thereby preventing scenarios where workloads share CPU resources in unintended ways.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/2625&#34;&gt;KEP-2625: node: cpumanager: add options to reject non SMT-aligned workload&lt;/a&gt;
led by SIG Node.&lt;/p&gt;
&lt;h3 id=&#34;defining-pod-affinity-or-anti-affinity-using-matchlabelkeys-and-mismatchlabelkeys&#34;&gt;Defining Pod affinity or anti-affinity using &lt;code&gt;matchLabelKeys&lt;/code&gt; and &lt;code&gt;mismatchLabelKeys&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;matchLabelKeys&lt;/code&gt; and &lt;code&gt;mismatchLabelKeys&lt;/code&gt; fields are available in Pod affinity terms, enabling
users to finely control the scope where Pods are expected to co-exist (Affinity) or not
(AntiAffinity). These newly stable options complement the existing &lt;code&gt;labelSelector&lt;/code&gt; mechanism. The
affinity fields facilitate enhanced scheduling for versatile rolling updates, as well as isolation
of services managed by tools or controllers based on global configurations.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/3633&#34;&gt;KEP-3633: Introduce MatchLabelKeys to Pod Affinity and Pod Anti Affinity&lt;/a&gt;
led by SIG Scheduling.&lt;/p&gt;
&lt;h3 id=&#34;considering-taints-and-tolerations-when-calculating-pod-topology-spread-skew&#34;&gt;Considering taints and tolerations when calculating Pod topology spread skew&lt;/h3&gt;
&lt;p&gt;This enhanced &lt;code&gt;PodTopologySpread&lt;/code&gt; by introducing two fields: &lt;code&gt;nodeAffinityPolicy&lt;/code&gt; and
&lt;code&gt;nodeTaintsPolicy&lt;/code&gt;. These fields allow users to specify whether node affinity rules and node taints
should be considered when calculating pod distribution across nodes. By default,
&lt;code&gt;nodeAffinityPolicy&lt;/code&gt; is set to &lt;code&gt;Honor&lt;/code&gt;, meaning only nodes matching the pod&#39;s node affinity or
selector are included in the distribution calculation. The &lt;code&gt;nodeTaintsPolicy&lt;/code&gt; defaults to &lt;code&gt;Ignore&lt;/code&gt;,
indicating that node taints are not considered unless specified. This enhancement provides finer
control over pod placement, ensuring that pods are scheduled on nodes that meet both affinity and
taint toleration requirements, thereby preventing scenarios where pods remain pending due to
unsatisfied constraints.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/3094&#34;&gt;KEP-3094: Take taints/tolerations into consideration when calculating PodTopologySpread skew&lt;/a&gt;
led by SIG Scheduling.&lt;/p&gt;
&lt;h3 id=&#34;volume-populators&#34;&gt;Volume populators&lt;/h3&gt;
&lt;p&gt;After being released as beta in v1.24, &lt;em&gt;volume populators&lt;/em&gt; have graduated to GA in v1.33. This newly
stable feature provides a way to allow users to pre-populate volumes with data from various sources,
and not just from PersistentVolumeClaim (PVC) clones or volume snapshots. The mechanism relies on
the &lt;code&gt;dataSourceRef&lt;/code&gt; field within a PersistentVolumeClaim. This field offers more flexibility than
the existing &lt;code&gt;dataSource&lt;/code&gt; field, and allows for custom resources to be used as data sources.&lt;/p&gt;
&lt;p&gt;A special controller, &lt;code&gt;volume-data-source-validator&lt;/code&gt;, validates these data source references,
alongside a newly stable CustomResourceDefinition (CRD) for an API kind named VolumePopulator. The
VolumePopulator API allows volume populator controllers to register the types of data sources they
support. You need to set up your cluster with the appropriate CRD in order to use volume populators.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/1495&#34;&gt;KEP-1495: Generic data populators&lt;/a&gt; led by
SIG Storage.&lt;/p&gt;
&lt;h3 id=&#34;always-honor-persistentvolume-reclaim-policy&#34;&gt;Always honor PersistentVolume reclaim policy&lt;/h3&gt;
&lt;p&gt;This enhancement addressed an issue where the Persistent Volume (PV) reclaim policy is not
consistently honored, leading to potential storage resource leaks. Specifically, if a PV is deleted
before its associated Persistent Volume Claim (PVC), the &amp;quot;Delete&amp;quot; reclaim policy may not be
executed, leaving the underlying storage assets intact. To mitigate this, Kubernetes now sets
finalizers on relevant PVs, ensuring that the reclaim policy is enforced regardless of the deletion
sequence. This enhancement prevents unintended retention of storage resources and maintains
consistency in PV lifecycle management.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/2644&#34;&gt;KEP-2644: Always Honor PersistentVolume Reclaim Policy&lt;/a&gt; led by SIG
Storage.&lt;/p&gt;
&lt;h2 id=&#34;new-features-in-beta&#34;&gt;New features in Beta&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;This is a selection of some of the improvements that are now beta following the v1.33 release.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&#34;support-for-direct-service-return-dsr-in-windows-kube-proxy&#34;&gt;Support for Direct Service Return (DSR) in Windows kube-proxy&lt;/h3&gt;
&lt;p&gt;DSR provides performance optimizations by allowing the return traffic routed through load balancers
to bypass the load balancer and respond directly to the client; reducing load on the load balancer
and also reducing overall latency. For information on DSR on Windows, read
&lt;a href=&#34;https://techcommunity.microsoft.com/blog/networkingblog/direct-server-return-dsr-in-a-nutshell/693710&#34;&gt;Direct Server Return (DSR) in a nutshell&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Initially introduced in v1.14, support for DSR has been promoted to beta by SIG Windows as part of
&lt;a href=&#34;https://kep.k8s.io/5100&#34;&gt;KEP-5100: Support for Direct Service Return (DSR) and overlay networking in Windows kube-proxy&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;structured-parameter-support&#34;&gt;Structured parameter support&lt;/h3&gt;
&lt;p&gt;While structured parameter support continues as a beta feature in Kubernetes v1.33, this core part
of Dynamic Resource Allocation (DRA) has seen significant improvements. A new v1beta2 version
simplifies the &lt;code&gt;resource.k8s.io&lt;/code&gt; API, and regular users with the namespaced cluster &lt;code&gt;edit&lt;/code&gt; role can
now use DRA.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;kubelet&lt;/code&gt; now includes seamless upgrade support, enabling drivers deployed as DaemonSets to use
a rolling update mechanism. For DRA implementations, this prevents the deletion and re-creation of
ResourceSlices, allowing them to remain unchanged during upgrades. Additionally, a 30-second grace
period has been introduced before the &lt;code&gt;kubelet&lt;/code&gt; cleans up after unregistering a driver, providing
better support for drivers that do not use rolling updates.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/4381&#34;&gt;KEP-4381: DRA: structured parameters&lt;/a&gt; by WG
Device Management, a cross-functional team including SIG Node, SIG Scheduling, and SIG Autoscaling.&lt;/p&gt;
&lt;h3 id=&#34;dynamic-resource-allocation-dra-for-network-interfaces&#34;&gt;Dynamic Resource Allocation (DRA) for network interfaces&lt;/h3&gt;
&lt;p&gt;The standardized reporting of network interface data via DRA, introduced in v1.32, has graduated to
beta in v1.33. This enables more native Kubernetes network integrations, simplifying the development
and management of networking devices. This was covered previously in the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/11/kubernetes-v1-32-release/#dra-standardized-network-interface-data-for-resource-claim-status&#34;&gt;v1.32 release announcement blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/4817&#34;&gt;KEP-4817: DRA: Resource Claim Status with possible standardized network interface data&lt;/a&gt;
led by SIG Network, SIG Node, and WG Device Management.&lt;/p&gt;
&lt;h3 id=&#34;handle-unscheduled-pods-early-when-scheduler-does-not-have-any-pod-on-activeq&#34;&gt;Handle unscheduled pods early when scheduler does not have any pod on activeQ&lt;/h3&gt;
&lt;p&gt;This feature improves queue scheduling behavior. Behind the scenes, the scheduler achieves this by
popping pods from the &lt;em&gt;backoffQ&lt;/em&gt;, which are not backed off due to errors, when the &lt;em&gt;activeQ&lt;/em&gt; is
empty. Previously, the scheduler would become idle even when the &lt;em&gt;activeQ&lt;/em&gt; was empty; this
enhancement improves scheduling efficiency by preventing that.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/5142&#34;&gt;KEP-5142: Pop pod from backoffQ when activeQ is empty&lt;/a&gt; led by SIG
Scheduling.&lt;/p&gt;
&lt;h3 id=&#34;asynchronous-preemption-in-the-kubernetes-scheduler&#34;&gt;Asynchronous preemption in the Kubernetes Scheduler&lt;/h3&gt;
&lt;p&gt;Preemption ensures higher-priority pods get the resources they need by evicting lower-priority ones.
Asynchronous Preemption, introduced in v1.32 as alpha, has graduated to beta in v1.33. With this
enhancement, heavy operations such as API calls to delete pods are processed in parallel, allowing
the scheduler to continue scheduling other pods without delays. This improvement is particularly
beneficial in clusters with high Pod churn or frequent scheduling failures, ensuring a more
efficient and resilient scheduling process.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/4832&#34;&gt;KEP-4832: Asynchronous preemption in the scheduler&lt;/a&gt; led by SIG Scheduling.&lt;/p&gt;
&lt;h3 id=&#34;clustertrustbundles&#34;&gt;ClusterTrustBundles&lt;/h3&gt;
&lt;p&gt;ClusterTrustBundle, a cluster-scoped resource designed for holding X.509 trust anchors (root
certificates), has graduated to beta in v1.33. This API makes it easier for in-cluster certificate
signers to publish and communicate X.509 trust anchors to cluster workloads.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/3257&#34;&gt;KEP-3257: ClusterTrustBundles (previously Trust Anchor Sets)&lt;/a&gt; led by SIG
Auth.&lt;/p&gt;
&lt;h3 id=&#34;fine-grained-supplementalgroups-control&#34;&gt;Fine-grained SupplementalGroups control&lt;/h3&gt;
&lt;p&gt;Introduced in v1.31, this feature graduates to beta in v1.33 and is now enabled by default. Provided
that your cluster has the &lt;code&gt;SupplementalGroupsPolicy&lt;/code&gt; feature gate enabled, the
&lt;code&gt;supplementalGroupsPolicy&lt;/code&gt; field within a Pod&#39;s &lt;code&gt;securityContext&lt;/code&gt; supports two policies: the default
Merge policy maintains backward compatibility by combining specified groups with those from the
container image&#39;s &lt;code&gt;/etc/group&lt;/code&gt; file, whereas the new Strict policy applies only to explicitly
defined groups.&lt;/p&gt;
&lt;p&gt;This enhancement helps to address security concerns where implicit group memberships from container
images could lead to unintended file access permissions and bypass policy controls.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/3619&#34;&gt;KEP-3619: Fine-grained SupplementalGroups control&lt;/a&gt; led by SIG Node.&lt;/p&gt;
&lt;h3 id=&#34;support-for-mounting-images-as-volumes&#34;&gt;Support for mounting images as volumes&lt;/h3&gt;
&lt;p&gt;Support for using Open Container Initiative (OCI) images as volumes in Pods, introduced in v1.31,
has graduated to beta. This feature allows users to specify an image reference as a volume in a Pod
while reusing it as a volume mount within containers. It opens up the possibility of packaging the
volume data separately, and sharing them among containers in a Pod without including them in the
main image, thereby reducing vulnerabilities and simplifying image creation.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/4639&#34;&gt;KEP-4639: VolumeSource: OCI Artifact and/or Image&lt;/a&gt; led by SIG Node and SIG
Storage.&lt;/p&gt;
&lt;h3 id=&#34;support-for-user-namespaces-within-linux-pods&#34;&gt;Support for user namespaces within Linux Pods&lt;/h3&gt;
&lt;p&gt;One of the oldest open KEPs as of writing is &lt;a href=&#34;https://kep.k8s.io/127&#34;&gt;KEP-127&lt;/a&gt;, Pod security
improvement by using Linux &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/user-namespaces/&#34;&gt;User namespaces&lt;/a&gt; for
Pods. This KEP was first opened in late 2016, and after multiple iterations, had its alpha release
in v1.25, initial beta in v1.30 (where it was disabled by default), and has moved to on-by-default
beta as part of v1.33.&lt;/p&gt;
&lt;p&gt;This support will not impact existing Pods unless you manually specify &lt;code&gt;pod.spec.hostUsers&lt;/code&gt; to opt
in. As highlighted in the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/03/12/kubernetes-1-30-upcoming-changes/&#34;&gt;v1.30 sneak peek blog&lt;/a&gt;, this is an important
milestone for mitigating vulnerabilities.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/127&#34;&gt;KEP-127: Support User Namespaces in pods&lt;/a&gt; led
by SIG Node.&lt;/p&gt;
&lt;h3 id=&#34;pod-procmount-option&#34;&gt;Pod &lt;code&gt;procMount&lt;/code&gt; option&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;procMount&lt;/code&gt; option, introduced as alpha in v1.12, and off-by-default beta in v1.31, has moved to
an on-by-default beta in v1.33. This enhancement improves Pod isolation by allowing users to
fine-tune access to the &lt;code&gt;/proc&lt;/code&gt; filesystem. Specifically, it adds a field to the Pod
&lt;code&gt;securityContext&lt;/code&gt; that lets you override the default behavior of masking and marking certain &lt;code&gt;/proc&lt;/code&gt;
paths as read-only. This is particularly useful for scenarios where users want to run unprivileged
containers inside the Kubernetes Pod using user namespaces. Normally, the container runtime (via the
CRI implementation) starts the outer container with strict &lt;code&gt;/proc&lt;/code&gt; mount settings. However, to
successfully run nested containers with an unprivileged Pod, users need a mechanism to relax those
defaults, and this feature provides exactly that.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/4265&#34;&gt;KEP-4265: add ProcMount option&lt;/a&gt; led by SIG
Node.&lt;/p&gt;
&lt;h3 id=&#34;cpumanager-policy-to-distribute-cpus-across-numa-nodes&#34;&gt;CPUManager policy to distribute CPUs across NUMA nodes&lt;/h3&gt;
&lt;p&gt;This feature adds a new policy option for the CPU Manager to distribute CPUs across Non-Uniform
Memory Access (NUMA) nodes, rather than concentrating them on a single node. It optimizes CPU
resource allocation by balancing workloads across multiple NUMA nodes, thereby improving performance
and resource utilization in multi-NUMA systems.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/2902&#34;&gt;KEP-2902: Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them&lt;/a&gt;
led by SIG Node.&lt;/p&gt;
&lt;h3 id=&#34;zero-second-sleeps-for-container-prestop-hooks&#34;&gt;Zero-second sleeps for container PreStop hooks&lt;/h3&gt;
&lt;p&gt;Kubernetes 1.29 introduced a Sleep action for the &lt;code&gt;preStop&lt;/code&gt; lifecycle hook in Pods, allowing
containers to pause for a specified duration before termination. This provides a straightforward
method to delay container shutdown, facilitating tasks such as connection draining or cleanup
operations.&lt;/p&gt;
&lt;p&gt;The Sleep action in a &lt;code&gt;preStop&lt;/code&gt; hook can now accept a zero-second duration as a beta feature. This
allows defining a no-op &lt;code&gt;preStop&lt;/code&gt; hook, which is useful when a &lt;code&gt;preStop&lt;/code&gt; hook is required but no
delay is desired.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/3960&#34;&gt;KEP-3960: Introducing Sleep Action for PreStop Hook&lt;/a&gt; and
&lt;a href=&#34;https://kep.k8s.io/4818&#34;&gt;KEP-4818: Allow zero value for Sleep Action of PreStop Hook&lt;/a&gt; led by SIG
Node.&lt;/p&gt;
&lt;h3 id=&#34;internal-tooling-for-declarative-validation-of-kubernetes-native-types&#34;&gt;Internal tooling for declarative validation of Kubernetes-native types&lt;/h3&gt;
&lt;p&gt;Behind the scenes, the internals of Kubernetes are starting to use a new mechanism for validating
objects and changes to objects. Kubernetes v1.33 introduces &lt;code&gt;validation-gen&lt;/code&gt;, an internal tool that
Kubernetes contributors use to generate declarative validation rules. The overall goal is to improve
the robustness and maintainability of API validations by enabling developers to specify validation
constraints declaratively, reducing manual coding errors and ensuring consistency across the
codebase.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/5073&#34;&gt;KEP-5073: Declarative Validation Of Kubernetes Native Types With validation-gen&lt;/a&gt;
led by SIG API Machinery.&lt;/p&gt;
&lt;h2 id=&#34;new-features-in-alpha&#34;&gt;New features in Alpha&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;This is a selection of some of the improvements that are now alpha following the v1.33 release.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&#34;configurable-tolerance-for-horizontalpodautoscalers&#34;&gt;Configurable tolerance for HorizontalPodAutoscalers&lt;/h3&gt;
&lt;p&gt;This feature introduces configurable tolerance for HorizontalPodAutoscalers, which dampens scaling
reactions to small metric variations.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/4951&#34;&gt;KEP-4951: Configurable tolerance for Horizontal Pod Autoscalers&lt;/a&gt; led by
SIG Autoscaling.&lt;/p&gt;
&lt;h3 id=&#34;configurable-container-restart-delay&#34;&gt;Configurable container restart delay&lt;/h3&gt;
&lt;p&gt;Introduced as alpha1 in v1.32, this feature provides a set of kubelet-level configurations to
fine-tune how CrashLoopBackOff is handled.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/4603&#34;&gt;KEP-4603: Tune CrashLoopBackOff&lt;/a&gt; led by SIG
Node.&lt;/p&gt;
&lt;h3 id=&#34;custom-container-stop-signals&#34;&gt;Custom container stop signals&lt;/h3&gt;
&lt;p&gt;Before Kubernetes v1.33, stop signals could only be set in container image definitions (for example,
via the &lt;code&gt;StopSignal&lt;/code&gt; configuration field in the image metadata). If you wanted to modify termination
behavior, you needed to build a custom container image. By enabling the (alpha)
&lt;code&gt;ContainerStopSignals&lt;/code&gt; feature gate in Kubernetes v1.33, you can now define custom stop signals
directly within Pod specifications. This is defined in the container&#39;s &lt;code&gt;lifecycle.stopSignal&lt;/code&gt; field
and requires the Pod&#39;s &lt;code&gt;spec.os.name&lt;/code&gt; field to be present. If unspecified, containers fall back to
the image-defined stop signal (if present), or the container runtime default (typically SIGTERM for
Linux).&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/4960&#34;&gt;KEP-4960: Container Stop Signals&lt;/a&gt; led by SIG
Node.&lt;/p&gt;
&lt;h3 id=&#34;dra-enhancements-galore&#34;&gt;DRA enhancements galore!&lt;/h3&gt;
&lt;p&gt;Kubernetes v1.33 continues to develop Dynamic Resource Allocation (DRA) with features designed for
today’s complex infrastructures. DRA is an API for requesting and sharing resources between pods and
containers inside a pod. Typically those resources are devices such as GPUs, FPGAs, and network
adapters.&lt;/p&gt;
&lt;p&gt;The following are all the alpha DRA feature gates introduced in v1.33:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Similar to Node taints, by enabling the &lt;code&gt;DRADeviceTaints&lt;/code&gt; feature gate, devices support taints and
tolerations. An admin or a control plane component can taint devices to limit their usage.
Scheduling of pods which depend on those devices can be paused while a taint exists and/or pods
using a tainted device can be evicted.&lt;/li&gt;
&lt;li&gt;By enabling the feature gate &lt;code&gt;DRAPrioritizedList&lt;/code&gt;, DeviceRequests get a new field named
&lt;code&gt;firstAvailable&lt;/code&gt;. This field is an ordered list that allows the user to specify that a request may
be satisfied in different ways, including allocating nothing at all if some specific hardware is
not available.&lt;/li&gt;
&lt;li&gt;With feature gate &lt;code&gt;DRAAdminAccess&lt;/code&gt; enabled, only users authorized to create ResourceClaim or
ResourceClaimTemplate objects in namespaces labeled with &lt;code&gt;resource.k8s.io/admin-access: &amp;quot;true&amp;quot;&lt;/code&gt;
can use the &lt;code&gt;adminAccess&lt;/code&gt; field. This ensures that non-admin users cannot misuse the &lt;code&gt;adminAccess&lt;/code&gt;
feature.&lt;/li&gt;
&lt;li&gt;While it has been possible to consume device partitions since v1.31, vendors had to pre-partition
devices and advertise them accordingly. By enabling the &lt;code&gt;DRAPartitionableDevices&lt;/code&gt; feature gate in
v1.33, device vendors can advertise multiple partitions, including overlapping ones. The
Kubernetes scheduler will choose the partition based on workload requests, and prevent the
allocation of conflicting partitions simultaneously. This feature gives vendors the ability to
dynamically create partitions at allocation time. The allocation and dynamic partitioning are
automatic and transparent to users, enabling improved resource utilization.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These feature gates have no effect unless you also enable the &lt;code&gt;DynamicResourceAllocation&lt;/code&gt; feature
gate.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/5055&#34;&gt;KEP-5055: DRA: device taints and tolerations&lt;/a&gt;,
&lt;a href=&#34;https://kep.k8s.io/4816&#34;&gt;KEP-4816: DRA: Prioritized Alternatives in Device Requests&lt;/a&gt;,
&lt;a href=&#34;https://kep.k8s.io/5018&#34;&gt;KEP-5018: DRA: AdminAccess for ResourceClaims and ResourceClaimTemplates&lt;/a&gt;,
and &lt;a href=&#34;https://kep.k8s.io/4815&#34;&gt;KEP-4815: DRA: Add support for partitionable devices&lt;/a&gt;, led by SIG
Node, SIG Scheduling and SIG Auth.&lt;/p&gt;
&lt;h3 id=&#34;robust-image-pull-policy-to-authenticate-images-for-ifnotpresent-and-never&#34;&gt;Robust image pull policy to authenticate images for &lt;code&gt;IfNotPresent&lt;/code&gt; and &lt;code&gt;Never&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This feature allows users to ensure that kubelet requires an image pull authentication check for
each new set of credentials, regardless of whether the image is already present on the node.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/2535&#34;&gt;KEP-2535: Ensure secret pulled images&lt;/a&gt; led
by SIG Auth.&lt;/p&gt;
&lt;h3 id=&#34;node-topology-labels-are-available-via-downward-api&#34;&gt;Node topology labels are available via downward API&lt;/h3&gt;
&lt;p&gt;This feature enables Node topology labels to be exposed via the downward API. Prior to Kubernetes
v1.33, a workaround involved using an init container to query the Kubernetes API for the underlying
node; this alpha feature simplifies how workloads can access Node topology information.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/4742&#34;&gt;KEP-4742: Expose Node labels via downward API&lt;/a&gt; led by SIG Node.&lt;/p&gt;
&lt;h3 id=&#34;better-pod-status-with-generation-and-observed-generation&#34;&gt;Better pod status with generation and observed generation&lt;/h3&gt;
&lt;p&gt;Prior to this change, the &lt;code&gt;metadata.generation&lt;/code&gt; field was unused in pods. Along with extending to
support &lt;code&gt;metadata.generation&lt;/code&gt;, this feature will introduce &lt;code&gt;status.observedGeneration&lt;/code&gt; to provide
clearer pod status.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/5067&#34;&gt;KEP-5067: Pod Generation&lt;/a&gt; led by SIG Node.&lt;/p&gt;
&lt;h3 id=&#34;support-for-split-level-3-cache-architecture-with-kubelet-s-cpu-manager&#34;&gt;Support for split level 3 cache architecture with kubelet’s CPU Manager&lt;/h3&gt;
&lt;p&gt;The previous kubelet’s CPU Manager was unaware of split L3 cache architecture (also known as Last
Level Cache, or LLC), and can potentially distribute CPU assignments without considering the split
L3 cache, causing a noisy neighbor problem. This alpha feature improves the CPU Manager to better
assign CPU cores for better performance.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/5109&#34;&gt;KEP-5109: Split L3 Cache Topology Awareness in CPU Manager&lt;/a&gt; led by SIG
Node.&lt;/p&gt;
&lt;h3 id=&#34;psi-pressure-stall-information-metrics-for-scheduling-improvements&#34;&gt;PSI (Pressure Stall Information) metrics for scheduling improvements&lt;/h3&gt;
&lt;p&gt;This feature adds support on Linux nodes for providing PSI stats and metrics using cgroupv2. It can
detect resource shortages and provide nodes with more granular control for pod scheduling.&lt;/p&gt;
&lt;p&gt;This work was done as part of &lt;a href=&#34;https://kep.k8s.io/4205&#34;&gt;KEP-4205: Support PSI based on cgroupv2&lt;/a&gt; led
by SIG Node.&lt;/p&gt;
&lt;h3 id=&#34;secret-less-image-pulls-with-kubelet&#34;&gt;Secret-less image pulls with kubelet&lt;/h3&gt;
&lt;p&gt;The kubelet&#39;s on-disk credential provider now supports optional Kubernetes ServiceAccount (SA) token
fetching. This simplifies authentication with image registries by allowing cloud providers to better
integrate with OIDC compatible identity solutions.&lt;/p&gt;
&lt;p&gt;This work was done as part of
&lt;a href=&#34;https://kep.k8s.io/4412&#34;&gt;KEP-4412: Projected service account tokens for Kubelet image credential providers&lt;/a&gt;
led by SIG Auth.&lt;/p&gt;
&lt;h2 id=&#34;graduations-deprecations-and-removals-in-v1-33&#34;&gt;Graduations, deprecations, and removals in v1.33&lt;/h2&gt;
&lt;h3 id=&#34;graduations-to-stable&#34;&gt;Graduations to stable&lt;/h3&gt;
&lt;p&gt;This lists all the features that have graduated to stable (also known as &lt;em&gt;general availability&lt;/em&gt;).
For a full list of updates including new features and graduations from alpha to beta, see the
release notes.&lt;/p&gt;
&lt;p&gt;This release includes a total of 18 enhancements promoted to stable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/3094&#34;&gt;Take taints/tolerations into consideration when calculating PodTopologySpread skew&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/3633&#34;&gt;Introduce &lt;code&gt;MatchLabelKeys&lt;/code&gt; to Pod Affinity and Pod Anti Affinity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4193&#34;&gt;Bound service account token improvements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/1495&#34;&gt;Generic data populators&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/1880&#34;&gt;Multiple Service CIDRs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/2433&#34;&gt;Topology Aware Routing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/2589&#34;&gt;Portworx file in-tree to CSI driver migration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/2644&#34;&gt;Always Honor PersistentVolume Reclaim Policy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/3866&#34;&gt;nftables kube-proxy backend&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4004&#34;&gt;Deprecate status.nodeInfo.kubeProxyVersion field&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/2590&#34;&gt;Add subresource support to kubectl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/3850&#34;&gt;Backoff Limit Per Index For Indexed Jobs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/3998&#34;&gt;Job success/completion policy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/753&#34;&gt;Sidecar Containers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4008&#34;&gt;CRD Validation Ratcheting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/2625&#34;&gt;node: cpumanager: add options to reject non SMT-aligned workload&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4444&#34;&gt;Traffic Distribution for Services&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/3857&#34;&gt;Recursive Read-only (RRO) mounts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;deprecations-and-removals&#34;&gt;Deprecations and removals&lt;/h3&gt;
&lt;p&gt;As Kubernetes develops and matures, features may be deprecated, removed, or replaced with better
ones to improve the project&#39;s overall health. See the Kubernetes
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/using-api/deprecation-policy/&#34;&gt;deprecation and removal policy&lt;/a&gt; for more details on
this process. Many of these deprecations and removals were announced in the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/26/kubernetes-v1-33-upcoming-changes/&#34;&gt;Deprecations and Removals blog post&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&#34;deprecation-of-the-stable-endpoints-api&#34;&gt;Deprecation of the stable Endpoints API&lt;/h4&gt;
&lt;p&gt;The &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/services-networking/endpoint-slices/&#34;&gt;EndpointSlices&lt;/a&gt; API has been stable since
v1.21, which effectively replaced the original Endpoints API. While the original Endpoints API was
simple and straightforward, it also posed some challenges when scaling to large numbers of network
endpoints. The EndpointSlices API has introduced new features such as dual-stack networking, making
the original Endpoints API ready for deprecation.&lt;/p&gt;
&lt;p&gt;This deprecation affects only those who use the Endpoints API directly from workloads or scripts;
these users should migrate to use EndpointSlices instead. There will be a dedicated blog post with
more details on the deprecation implications and migration plans.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/4974&#34;&gt;KEP-4974: Deprecate v1.Endpoints&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&#34;removal-of-kube-proxy-version-information-in-node-status&#34;&gt;Removal of kube-proxy version information in node status&lt;/h4&gt;
&lt;p&gt;Following its deprecation in v1.31, as highlighted in the v1.31
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/07/19/kubernetes-1-31-upcoming-changes/#deprecation-of-status-nodeinfo-kubeproxyversion-field-for-nodes-kep-4004-https-github-com-kubernetes-enhancements-issues-4004&#34;&gt;release announcement&lt;/a&gt;,
the &lt;code&gt;.status.nodeInfo.kubeProxyVersion&lt;/code&gt; field for Nodes was removed in v1.33.&lt;/p&gt;
&lt;p&gt;This field was set by kubelet, but its value was not consistently accurate. As it has been disabled
by default since v1.31, this field has been removed entirely in v1.33.&lt;/p&gt;
&lt;p&gt;You can find more in
&lt;a href=&#34;https://kep.k8s.io/4004&#34;&gt;KEP-4004: Deprecate status.nodeInfo.kubeProxyVersion field&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&#34;removal-of-in-tree-gitrepo-volume-driver&#34;&gt;Removal of in-tree gitRepo volume driver&lt;/h4&gt;
&lt;p&gt;The gitRepo volume type has been deprecated since v1.11, nearly 7 years ago. Since its deprecation,
there have been security concerns, including how gitRepo volume types can be exploited to gain
remote code execution as root on the nodes. In v1.33, the in-tree driver code is removed.&lt;/p&gt;
&lt;p&gt;There are alternatives such as git-sync and initContainers. &lt;code&gt;gitVolumes&lt;/code&gt; in the Kubernetes API is
not removed, and thus pods with &lt;code&gt;gitRepo&lt;/code&gt; volumes will be admitted by kube-apiserver, but kubelets
with the feature-gate &lt;code&gt;GitRepoVolumeDriver&lt;/code&gt; set to false will not run them and return an appropriate
error to the user. This allows users to opt-in to re-enabling the driver for 3 versions to give them
enough time to fix workloads.&lt;/p&gt;
&lt;p&gt;The feature gate in kubelet and in-tree plugin code is planned to be removed in the v1.39 release.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/5040&#34;&gt;KEP-5040: Remove gitRepo volume driver&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&#34;removal-of-host-network-support-for-windows-pods&#34;&gt;Removal of host network support for Windows pods&lt;/h4&gt;
&lt;p&gt;Windows Pod networking aimed to achieve feature parity with Linux and provide better cluster density
by allowing containers to use the Node’s networking namespace. The original implementation landed as
alpha with v1.26, but because it faced unexpected containerd behaviours and alternative solutions
were available, the Kubernetes project has decided to withdraw the associated KEP. Support was fully
removed in v1.33.&lt;/p&gt;
&lt;p&gt;Please note that this does not affect
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/configure-pod-container/create-hostprocess-pod/&#34;&gt;HostProcess containers&lt;/a&gt;, which
provides host network as well as host level access. The KEP withdrawn in v1.33 was about providing
the host network only, which was never stable due to technical limitations with Windows networking
logic.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/3503&#34;&gt;KEP-3503: Host network support for Windows pods&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;release-notes&#34;&gt;Release notes&lt;/h2&gt;
&lt;p&gt;Check out the full details of the Kubernetes v1.33 release in our
&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.33.md&#34;&gt;release notes&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;availability&#34;&gt;Availability&lt;/h2&gt;
&lt;p&gt;Kubernetes v1.33 is available for download on
&lt;a href=&#34;https://github.com/kubernetes/kubernetes/releases/tag/v1.33.0&#34;&gt;GitHub&lt;/a&gt; or on the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/releases/download/&#34;&gt;Kubernetes download page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To get started with Kubernetes, check out these &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tutorials/&#34;&gt;interactive tutorials&lt;/a&gt; or run
local Kubernetes clusters using &lt;a href=&#34;https://minikube.sigs.k8s.io/&#34;&gt;minikube&lt;/a&gt;. You can also easily
install v1.33 using
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/&#34;&gt;kubeadm&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;release-team&#34;&gt;Release Team&lt;/h2&gt;
&lt;p&gt;Kubernetes is only possible with the support, commitment, and hard work of its community. Release
Team is made up of dedicated community volunteers who work together to build the many pieces that
make up the Kubernetes releases you rely on. This requires the specialized skills of people from all
corners of our community, from the code itself to its documentation and project management.&lt;/p&gt;
&lt;p&gt;We would like to thank the entire
&lt;a href=&#34;https://github.com/kubernetes/sig-release/blob/master/releases/release-1.33/release-team.md&#34;&gt;Release Team&lt;/a&gt;
for the hours spent hard at work to deliver the Kubernetes v1.33 release to our community. The
Release Team&#39;s membership ranges from first-time shadows to returning team leads with experience
forged over several release cycles. There was a new team structure adopted in this release cycle,
which was to combine Release Notes and Docs subteams into a unified subteam of Docs. Thanks to the
meticulous effort in organizing the relevant information and resources from the new Docs team, both
Release Notes and Docs tracking have seen a smooth and successful transition. Finally, a very
special thanks goes out to our release lead, Nina Polshakova, for her support throughout a
successful release cycle, her advocacy, her efforts to ensure that everyone could contribute
effectively, and her challenges to improve the release process.&lt;/p&gt;
&lt;h2 id=&#34;project-velocity&#34;&gt;Project velocity&lt;/h2&gt;
&lt;p&gt;The CNCF K8s
&lt;a href=&#34;https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;var-period=m&amp;var-repogroup_name=All&#34;&gt;DevStats&lt;/a&gt;
project aggregates several interesting data points related to the velocity of Kubernetes and various
subprojects. This includes everything from individual contributions, to the number of companies
contributing, and illustrates the depth and breadth of effort that goes into evolving this
ecosystem.&lt;/p&gt;
&lt;p&gt;During the v1.33 release cycle, which spanned 15 weeks from January 13 to April 23, 2025, Kubernetes
received contributions from as many as 121 different companies and 570 individuals (as of writing, a
few weeks before the release date). In the wider cloud native ecosystem, the figure goes up to 435
companies counting 2400 total contributors. You can find the data source in
&lt;a href=&#34;https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;var-period=d28&amp;var-repogroup_name=All&amp;var-repo_name=kubernetes%2Fkubernetes&amp;from=1736755200000&amp;to=1745477999000&#34;&gt;this dashboard&lt;/a&gt;.
Compared to the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/11/kubernetes-v1-32-release/#project-velocity&#34;&gt;velocity data from previous release, v1.32&lt;/a&gt;,
we see a similar level of contribution from companies and individuals, indicating strong community
interest and engagement.&lt;/p&gt;
&lt;p&gt;Note that, “contribution” counts when someone makes a commit, code review, comment, creates an issue
or PR, reviews a PR (including blogs and documentation) or comments on issues and PRs. If you are
interested in contributing, visit
&lt;a href=&#34;https://www.kubernetes.dev/docs/guide/#getting-started&#34;&gt;Getting Started&lt;/a&gt; on our contributor
website.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;var-period=m&amp;var-repogroup_name=All&#34;&gt;Check out DevStats&lt;/a&gt;
to learn more about the overall velocity of the Kubernetes project and community.&lt;/p&gt;
&lt;h2 id=&#34;event-update&#34;&gt;Event update&lt;/h2&gt;
&lt;p&gt;Explore upcoming Kubernetes and cloud native events, including KubeCon + CloudNativeCon, KCD, and
other notable conferences worldwide. Stay informed and get involved with the Kubernetes community!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;May 2025&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-costa-rica-presents-kcd-costa-rica-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Costa Rica&lt;/strong&gt;&lt;/a&gt;:
May 3, 2025 | Heredia, Costa Rica&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-helsinki-presents-kcd-helsinki-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Helsinki&lt;/strong&gt;&lt;/a&gt;:
May 6, 2025 | Helsinki, Finland&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-texas-presents-kcd-texas-austin-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Texas Austin&lt;/strong&gt;&lt;/a&gt;:
May 15, 2025 | Austin, USA&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-south-korea-presents-kcd-seoul-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Seoul&lt;/strong&gt;&lt;/a&gt;:
May 22, 2025 | Seoul, South Korea&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-istanbul-presents-kcd-istanbul-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Istanbul, Turkey&lt;/strong&gt;&lt;/a&gt;:
May 23, 2025 | Istanbul, Turkey&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-sf-bay-area-presents-kcd-san-francisco-bay-area/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: San Francisco Bay Area&lt;/strong&gt;&lt;/a&gt;:
May 28, 2025 | San Francisco, USA&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;June 2025&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-new-york-presents-kcd-new-york-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: New York&lt;/strong&gt;&lt;/a&gt;:
June 4, 2025 | New York, USA&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-czech-slovak-presents-kcd-czech-amp-slovak-bratislava-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Czech &amp;amp; Slovak&lt;/strong&gt;&lt;/a&gt;:
June 5, 2025 | Bratislava, Slovakia&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-bengaluru-presents-kubernetes-community-days-bengaluru-2025-in-person/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Bengaluru&lt;/strong&gt;&lt;/a&gt;:
June 6, 2025 | Bangalore, India&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://events.linuxfoundation.org/kubecon-cloudnativecon-china/&#34;&gt;&lt;strong&gt;KubeCon + CloudNativeCon China 2025&lt;/strong&gt;&lt;/a&gt;:
June 10-11, 2025 | Hong Kong&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-guatemala-presents-kcd-antigua-guatemala-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Antigua Guatemala&lt;/strong&gt;&lt;/a&gt;:
June 14, 2025 | Antigua Guatemala, Guatemala&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://events.linuxfoundation.org/kubecon-cloudnativecon-japan&#34;&gt;&lt;strong&gt;KubeCon + CloudNativeCon Japan 2025&lt;/strong&gt;&lt;/a&gt;:
June 16-17, 2025 | Tokyo, Japan&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.cncf.io/kcds/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Nigeria, Africa&lt;/strong&gt;&lt;/a&gt;: June 19, 2025 |
Nigeria, Africa&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;July 2025&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-netherlands-presents-kcd-utrecht-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Utrecht&lt;/strong&gt;&lt;/a&gt;:
July 4, 2025 | Utrecht, Netherlands&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-taiwan-presents-kcd-taipei-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Taipei&lt;/strong&gt;&lt;/a&gt;:
July 5, 2025 | Taipei, Taiwan&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-lima-peru-presents-kcd-lima-peru-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Lima, Peru&lt;/strong&gt;&lt;/a&gt;:
July 19, 2025 | Lima, Peru&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;August 2025&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://events.linuxfoundation.org/kubecon-cloudnativecon-india-2025/&#34;&gt;&lt;strong&gt;KubeCon + CloudNativeCon India 2025&lt;/strong&gt;&lt;/a&gt;:
August 6-7, 2025 | Hyderabad, India&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-colombia-presents-kcd-colombia-2025/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days: Colombia&lt;/strong&gt;&lt;/a&gt;:
August 29, 2025 | Bogotá, Colombia&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can find the latest KCD details &lt;a href=&#34;https://www.cncf.io/kcds/&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;upcoming-release-webinar&#34;&gt;Upcoming release webinar&lt;/h2&gt;
&lt;p&gt;Join members of the Kubernetes v1.33 Release Team on &lt;strong&gt;Friday, May 16th 2025 at 4:00 PM (UTC)&lt;/strong&gt;, to
learn about the release highlights of this release, as well as deprecations and removals to help
plan for upgrades. For more information and registration, visit the
&lt;a href=&#34;https://community.cncf.io/events/details/cncf-cncf-online-programs-presents-cncf-live-webinar-kubernetes-133-release/&#34;&gt;event page&lt;/a&gt;
on the CNCF Online Programs site.&lt;/p&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;Get involved&lt;/h2&gt;
&lt;p&gt;The simplest way to get involved with Kubernetes is by joining one of the many
&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-list.md&#34;&gt;Special Interest Groups&lt;/a&gt; (SIGs)
that align with your interests. Have something you’d like to broadcast to the Kubernetes community?
Share your voice at our weekly
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/communication&#34;&gt;community meeting&lt;/a&gt;, and through
the channels below. Thank you for your continued feedback and support.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Follow us on Bluesky &lt;a href=&#34;https://bsky.app/profile/kubernetes.io&#34;&gt;@kubernetes.io&lt;/a&gt; for the latest
updates&lt;/li&gt;
&lt;li&gt;Join the community discussion on &lt;a href=&#34;https://discuss.kubernetes.io/&#34;&gt;Discuss&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Join the community on &lt;a href=&#34;http://slack.k8s.io/&#34;&gt;Slack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post questions (or answer questions) on
&lt;a href=&#34;https://serverfault.com/questions/tagged/kubernetes&#34;&gt;Server Fault&lt;/a&gt; or
&lt;a href=&#34;http://stackoverflow.com/questions/tagged/kubernetes&#34;&gt;Stack Overflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Share your Kubernetes
&lt;a href=&#34;https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform&#34;&gt;story&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Read more about what’s happening with Kubernetes on the &lt;a href=&#34;https://kubernetes.io/blog/&#34;&gt;blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Learn more about the
&lt;a href=&#34;https://github.com/kubernetes/sig-release/tree/master/release-team&#34;&gt;Kubernetes Release Team&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes Multicontainer Pods: An Overview</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/22/multi-container-pods-overview/</link>
      <pubDate>Tue, 22 Apr 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/22/multi-container-pods-overview/</guid>
      <description>
        
        
        &lt;p&gt;As cloud-native architectures continue to evolve, Kubernetes has become the go-to platform for deploying complex, distributed systems. One of the most powerful yet nuanced design patterns in this ecosystem is the sidecar pattern—a technique that allows developers to extend application functionality without diving deep into source code.&lt;/p&gt;
&lt;h2 id=&#34;the-origins-of-the-sidecar-pattern&#34;&gt;The origins of the sidecar pattern&lt;/h2&gt;
&lt;p&gt;Think of a sidecar like a trusty companion motorcycle attachment. Historically, IT infrastructures have always used auxiliary services to handle critical tasks. Before containers, we relied on background processes and helper daemons to manage logging, monitoring, and networking. The microservices revolution transformed this approach, making sidecars a structured and intentional architectural choice.
With the rise of microservices, the sidecar pattern became more clearly defined, allowing developers to offload specific responsibilities from the main service without altering its code. Service meshes like Istio and Linkerd have popularized sidecar proxies, demonstrating how these companion containers can elegantly handle observability, security, and traffic management in distributed systems.&lt;/p&gt;
&lt;h2 id=&#34;kubernetes-implementation&#34;&gt;Kubernetes implementation&lt;/h2&gt;
&lt;p&gt;In Kubernetes, &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/sidecar-containers/&#34;&gt;sidecar containers&lt;/a&gt; operate within
the same Pod as the main application, enabling communication and resource sharing.
Does this sound just like defining multiple containers along each other inside the Pod? It actually does, and
this is how sidecar containers had to be implemented before Kubernetes v1.29.0, which introduced
native support for sidecars.
Sidecar containers  can now be defined within a Pod manifest using the &lt;code&gt;spec.initContainers&lt;/code&gt; field. What makes
it a sidecar container is that you specify it with &lt;code&gt;restartPolicy: Always&lt;/code&gt;. You can see an example of this below, which is a partial snippet of the full Kubernetes manifest:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;logshipper&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;alpine:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;sh&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;-c&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;tail -F /opt/logs.txt&amp;#39;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeMounts&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;data&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mountPath&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/opt&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That field name, &lt;code&gt;spec.initContainers&lt;/code&gt; may sound confusing. How come when you want to define a sidecar container, you have to put an entry in the &lt;code&gt;spec.initContainers&lt;/code&gt; array? &lt;code&gt;spec.initContainers&lt;/code&gt; are run to completion just before main application starts, so they’re one-off, whereas sidecars often run in parallel to the main app container. It’s the &lt;code&gt;spec.initContainers&lt;/code&gt; with &lt;code&gt;restartPolicy:Always&lt;/code&gt; which differs classic &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/init-containers/&#34;&gt;init containers&lt;/a&gt; from Kubernetes-native sidecar containers and ensures they are always up.&lt;/p&gt;
&lt;h2 id=&#34;when-to-embrace-or-avoid-sidecars&#34;&gt;When to embrace (or avoid) sidecars&lt;/h2&gt;
&lt;p&gt;While the sidecar pattern can be useful in many cases, it is generally not the preferred approach unless the use case justifies it. Adding a sidecar increases complexity, resource consumption, and potential network latency. Instead, simpler alternatives such as built-in libraries or shared infrastructure should be considered first.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Deploy a sidecar when:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You need to extend application functionality without touching the original code&lt;/li&gt;
&lt;li&gt;Implementing cross-cutting concerns like logging, monitoring or security&lt;/li&gt;
&lt;li&gt;Working with legacy applications requiring modern networking capabilities&lt;/li&gt;
&lt;li&gt;Designing microservices that demand independent scaling and updates&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Proceed with caution if:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Resource efficiency is your primary concern&lt;/li&gt;
&lt;li&gt;Minimal network latency is critical&lt;/li&gt;
&lt;li&gt;Simpler alternatives exist&lt;/li&gt;
&lt;li&gt;You want to minimize troubleshooting complexity&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;four-essential-multi-container-patterns&#34;&gt;Four essential multi-container patterns&lt;/h2&gt;
&lt;h3 id=&#34;init-container-pattern&#34;&gt;Init container pattern&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Init container&lt;/strong&gt; pattern is used to execute (often critical) setup tasks before the main application container starts. Unlike regular containers, init containers run to completion and then terminate, ensuring that preconditions for the main application are met.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ideal for:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Preparing configurations&lt;/li&gt;
&lt;li&gt;Loading secrets&lt;/li&gt;
&lt;li&gt;Verifying dependency availability&lt;/li&gt;
&lt;li&gt;Running database migrations&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The init container ensures your application starts in a predictable, controlled environment without code modifications.&lt;/p&gt;
&lt;h3 id=&#34;ambassador-pattern&#34;&gt;Ambassador pattern&lt;/h3&gt;
&lt;p&gt;An ambassador container provides Pod-local helper services that expose a simple way to access a network service. Commonly, ambassador containers send network requests on behalf of a an application container and
take care of challenges such as service discovery, peer identity verification, or encryption in transit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Perfect when you need to:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Offload client connectivity concerns&lt;/li&gt;
&lt;li&gt;Implement language-agnostic networking features&lt;/li&gt;
&lt;li&gt;Add security layers like TLS&lt;/li&gt;
&lt;li&gt;Create robust circuit breakers and retry mechanisms&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;configuration-helper&#34;&gt;Configuration helper&lt;/h3&gt;
&lt;p&gt;A &lt;em&gt;configuration helper&lt;/em&gt; sidecar provides configuration updates to an application dynamically, ensuring it always has access to the latest settings without disrupting the service. Often the helper needs to provide an initial
configuration before the application would be able to start successfully.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Use cases:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Fetching environment variables and secrets&lt;/li&gt;
&lt;li&gt;Polling configuration changes&lt;/li&gt;
&lt;li&gt;Decoupling configuration management from application logic&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;adapter-pattern&#34;&gt;Adapter pattern&lt;/h3&gt;
&lt;p&gt;An &lt;em&gt;adapter&lt;/em&gt; (or sometimes &lt;em&gt;façade&lt;/em&gt;) container enables interoperability between the main application container and external services. It does this by translating data formats, protocols, or APIs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Transforming legacy data formats&lt;/li&gt;
&lt;li&gt;Bridging communication protocols&lt;/li&gt;
&lt;li&gt;Facilitating integration between mismatched services&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;wrap-up&#34;&gt;Wrap-up&lt;/h2&gt;
&lt;p&gt;While sidecar patterns offer tremendous flexibility, they&#39;re not a silver bullet. Each added sidecar introduces complexity, consumes resources, and potentially increases operational overhead. Always evaluate simpler alternatives first.
The key is strategic implementation: use sidecars as precision tools to solve specific architectural challenges, not as a default approach. When used correctly, they can improve security, networking, and configuration management in containerized environments.
Choose wisely, implement carefully, and let your sidecars elevate your container ecosystem.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Introducing kube-scheduler-simulator</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/07/introducing-kube-scheduler-simulator/</link>
      <pubDate>Mon, 07 Apr 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/04/07/introducing-kube-scheduler-simulator/</guid>
      <description>
        
        
        &lt;p&gt;The Kubernetes Scheduler is a crucial control plane component that determines which node a Pod will run on.
Thus, anyone utilizing Kubernetes relies on a scheduler.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes-sigs/kube-scheduler-simulator&#34;&gt;kube-scheduler-simulator&lt;/a&gt; is a &lt;em&gt;simulator&lt;/em&gt; for the Kubernetes scheduler, that started as a &lt;a href=&#34;https://summerofcode.withgoogle.com/&#34;&gt;Google Summer of Code 2021&lt;/a&gt; project developed by me (Kensei Nakada) and later received a lot of contributions.
This tool allows users to closely examine the scheduler’s behavior and decisions.&lt;/p&gt;
&lt;p&gt;It is useful for casual users who employ scheduling constraints (for example, &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity/#affinity-and-anti-affinity&#34;&gt;inter-Pod affinity&lt;/a&gt;)
and experts who extend the scheduler with custom plugins.&lt;/p&gt;
&lt;h2 id=&#34;motivation&#34;&gt;Motivation&lt;/h2&gt;
&lt;p&gt;The scheduler often appears as a black box,
composed of many plugins that each contribute to the scheduling decision-making process from their unique perspectives.
Understanding its behavior can be challenging due to the multitude of factors it considers.&lt;/p&gt;
&lt;p&gt;Even if a Pod appears to be scheduled correctly in a simple test cluster, it might have been scheduled based on different calculations than expected. This discrepancy could lead to unexpected scheduling outcomes when deployed in a large production environment.&lt;/p&gt;
&lt;p&gt;Also, testing a scheduler is a complex challenge.
There are countless patterns of operations executed within a real cluster, making it unfeasible to anticipate every scenario with a finite number of tests.
More often than not, bugs are discovered only when the scheduler is deployed in an actual cluster.
Actually, many bugs are found by users after shipping the release,
even in the upstream kube-scheduler.&lt;/p&gt;
&lt;p&gt;Having a development or sandbox environment for testing the scheduler — or, indeed, any Kubernetes controllers — is a common practice.
However, this approach falls short of capturing all the potential scenarios that might arise in a production cluster
because a development cluster is often much smaller with notable differences in workload sizes and scaling dynamics.
It never sees the exact same use or exhibits the same behavior as its production counterpart.&lt;/p&gt;
&lt;p&gt;The kube-scheduler-simulator aims to solve those problems.
It enables users to test their scheduling constraints, scheduler configurations,
and custom plugins while checking every detailed part of scheduling decisions.
It also allows users to create a simulated cluster environment, where they can test their scheduler
with the same resources as their production cluster without affecting actual workloads.&lt;/p&gt;
&lt;h2 id=&#34;features-of-the-kube-scheduler-simulator&#34;&gt;Features of the kube-scheduler-simulator&lt;/h2&gt;
&lt;p&gt;The kube-scheduler-simulator’s core feature is its ability to expose the scheduler&#39;s internal decisions.
The scheduler operates based on the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/scheduling-framework/&#34;&gt;scheduling framework&lt;/a&gt;,
using various plugins at different extension points,
filter nodes (Filter phase), score nodes (Score phase), and ultimately determine the best node for the Pod.&lt;/p&gt;
&lt;p&gt;The simulator allows users to create Kubernetes resources and observe how each plugin influences the scheduling decisions for Pods.
This visibility helps users understand the scheduler’s workings and define appropriate scheduling constraints.&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/images/blog/2025-04-07-kube-scheduler-simulator/simulator.png&#34;
         alt=&#34;Screenshot of the simulator web frontend that shows the detailed scheduling results per node and per extension point&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;The simulator web frontend&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Inside the simulator, a debuggable scheduler runs instead of the vanilla scheduler.
This debuggable scheduler outputs the results of each scheduler plugin at every extension point to the Pod’s annotations like the following manifest shows
and the web front end formats/visualizes the scheduling results based on these annotations.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# The JSONs within these annotations are manually formatted for clarity in the blog post. &lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/bind-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{&amp;#34;DefaultBinder&amp;#34;:&amp;#34;success&amp;#34;}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/filter-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-jjfg5&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeName&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeUnschedulable&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;passed&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-mtb5x&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeName&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeUnschedulable&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;passed&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      }&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/finalscore-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-jjfg5&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;ImageLocality&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeAffinity&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;52&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;47&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;300&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;VolumeBinding&amp;#34;:&amp;#34;0&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-mtb5x&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;ImageLocality&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeAffinity&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;76&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;73&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;300&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;VolumeBinding&amp;#34;:&amp;#34;0&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      } &lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/permit-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/permit-result-timeout&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/postfilter-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/prebind-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{&amp;#34;VolumeBinding&amp;#34;:&amp;#34;success&amp;#34;}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/prefilter-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/prefilter-result-status&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;AzureDiskLimits&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;EBSLimits&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;GCEPDLimits&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;InterPodAffinity&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeAffinity&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodePorts&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;success&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeVolumeLimits&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;PodTopologySpread&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;VolumeBinding&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;VolumeRestrictions&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;VolumeZone&amp;#34;:&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      }&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/prescore-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;InterPodAffinity&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeAffinity&amp;#34;:&amp;#34;success&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;success&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;success&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;PodTopologySpread&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;TaintToleration&amp;#34;:&amp;#34;success&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      }&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/reserve-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{&amp;#34;VolumeBinding&amp;#34;:&amp;#34;success&amp;#34;}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/result-history&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      [
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/bind-result&amp;#34;:&amp;#34;{\&amp;#34;DefaultBinder\&amp;#34;:\&amp;#34;success\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/filter-result&amp;#34;:&amp;#34;{\&amp;#34;node-jjfg5\&amp;#34;:{\&amp;#34;NodeName\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;NodeUnschedulable\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;passed\&amp;#34;},\&amp;#34;node-mtb5x\&amp;#34;:{\&amp;#34;NodeName\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;NodeUnschedulable\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;passed\&amp;#34;}}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/finalscore-result&amp;#34;:&amp;#34;{\&amp;#34;node-jjfg5\&amp;#34;:{\&amp;#34;ImageLocality\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;52\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;47\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;300\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;0\&amp;#34;},\&amp;#34;node-mtb5x\&amp;#34;:{\&amp;#34;ImageLocality\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;76\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;73\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;300\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;0\&amp;#34;}}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/permit-result&amp;#34;:&amp;#34;{}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/permit-result-timeout&amp;#34;:&amp;#34;{}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/postfilter-result&amp;#34;:&amp;#34;{}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/prebind-result&amp;#34;:&amp;#34;{\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;success\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/prefilter-result&amp;#34;:&amp;#34;{}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/prefilter-result-status&amp;#34;:&amp;#34;{\&amp;#34;AzureDiskLimits\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;EBSLimits\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;GCEPDLimits\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;InterPodAffinity\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;NodePorts\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;success\&amp;#34;,\&amp;#34;NodeVolumeLimits\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;PodTopologySpread\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;VolumeRestrictions\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;VolumeZone\&amp;#34;:\&amp;#34;\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/prescore-result&amp;#34;:&amp;#34;{\&amp;#34;InterPodAffinity\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;success\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;success\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;success\&amp;#34;,\&amp;#34;PodTopologySpread\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;success\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/reserve-result&amp;#34;:&amp;#34;{\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;success\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/score-result&amp;#34;:&amp;#34;{\&amp;#34;node-jjfg5\&amp;#34;:{\&amp;#34;ImageLocality\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;52\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;47\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;0\&amp;#34;},\&amp;#34;node-mtb5x\&amp;#34;:{\&amp;#34;ImageLocality\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;76\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;73\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;0\&amp;#34;}}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/selected-node&amp;#34;:&amp;#34;node-mtb5x&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      ]&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/score-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-jjfg5&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;ImageLocality&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeAffinity&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;52&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;47&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;VolumeBinding&amp;#34;:&amp;#34;0&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-mtb5x&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;ImageLocality&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeAffinity&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;76&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;73&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;VolumeBinding&amp;#34;:&amp;#34;0&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      }&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/selected-node&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-mtb5x&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Users can also integrate &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/scheduling-framework/&#34;&gt;their custom plugins&lt;/a&gt; or &lt;a href=&#34;https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/scheduler_extender.md&#34;&gt;extenders&lt;/a&gt;, into the debuggable scheduler and visualize their results.&lt;/p&gt;
&lt;p&gt;This debuggable scheduler can also run standalone, for example, on any Kubernetes cluster or in integration tests.
This would be useful to custom plugin developers who want to test their plugins or examine their custom scheduler in a real cluster with better debuggability.&lt;/p&gt;
&lt;h2 id=&#34;the-simulator-as-a-better-dev-cluster&#34;&gt;The simulator as a better dev cluster&lt;/h2&gt;
&lt;p&gt;As mentioned earlier, with a limited set of tests, it is impossible to predict every possible scenario in a real-world cluster.
Typically, users will test the scheduler in a small, development cluster before deploying it to production, hoping that no issues arise.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes-sigs/kube-scheduler-simulator/blob/master/simulator/docs/import-cluster-resources.md&#34;&gt;The simulator’s importing feature&lt;/a&gt;
provides a solution by allowing users to simulate deploying a new scheduler version in a production-like environment without impacting their live workloads.&lt;/p&gt;
&lt;p&gt;By continuously syncing between a production cluster and the simulator, users can safely test a new scheduler version with the same resources their production cluster handles.
Once confident in its performance, they can proceed with the production deployment, reducing the risk of unexpected issues.&lt;/p&gt;
&lt;h2 id=&#34;what-are-the-use-cases&#34;&gt;What are the use cases?&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Cluster users&lt;/strong&gt;: Examine if scheduling constraints (for example, PodAffinity, PodTopologySpread) work as intended.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cluster admins&lt;/strong&gt;: Assess how a cluster would behave with changes to the scheduler configuration.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scheduler plugin developers&lt;/strong&gt;: Test a custom scheduler plugins or extenders, use the debuggable scheduler in integration tests or development clusters, or use the &lt;a href=&#34;https://github.com/kubernetes-sigs/kube-scheduler-simulator/blob/simulator/v0.3.0/simulator/docs/import-cluster-resources.md&#34;&gt;syncing&lt;/a&gt; feature for testing within a production-like environment.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;getting-started&#34;&gt;Getting started&lt;/h2&gt;
&lt;p&gt;The simulator only requires Docker to be installed on a machine; a Kubernetes cluster is not necessary.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;git clone git@github.com:kubernetes-sigs/kube-scheduler-simulator.git
cd kube-scheduler-simulator
make docker_up
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can then access the simulator&#39;s web UI at &lt;code&gt;http://localhost:3000&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Visit the &lt;a href=&#34;https://sigs.k8s.io/kube-scheduler-simulator&#34;&gt;kube-scheduler-simulator repository&lt;/a&gt; for more details!&lt;/p&gt;
&lt;h2 id=&#34;getting-involved&#34;&gt;Getting involved&lt;/h2&gt;
&lt;p&gt;The scheduler simulator is developed by &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-scheduling/README.md#kube-scheduler-simulator&#34;&gt;Kubernetes SIG Scheduling&lt;/a&gt;. Your feedback and contributions are welcome!&lt;/p&gt;
&lt;p&gt;Open issues or PRs at the &lt;a href=&#34;https://sigs.k8s.io/kube-scheduler-simulator&#34;&gt;kube-scheduler-simulator repository&lt;/a&gt;.
Join the conversation on the &lt;a href=&#34;https://kubernetes.slack.com/messages/sig-scheduling&#34;&gt;#sig-scheduling&lt;/a&gt; slack channel.&lt;/p&gt;
&lt;h2 id=&#34;acknowledgments&#34;&gt;Acknowledgments&lt;/h2&gt;
&lt;p&gt;The simulator has been maintained by dedicated volunteer engineers, overcoming many challenges to reach its current form.&lt;/p&gt;
&lt;p&gt;A big shout out to all &lt;a href=&#34;https://github.com/kubernetes-sigs/kube-scheduler-simulator/graphs/contributors&#34;&gt;the awesome contributors&lt;/a&gt;!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33 sneak peek</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/26/kubernetes-v1-33-upcoming-changes/</link>
      <pubDate>Wed, 26 Mar 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/26/kubernetes-v1-33-upcoming-changes/</guid>
      <description>
        
        
        &lt;p&gt;As the release of Kubernetes v1.33 approaches, the Kubernetes project continues to evolve. Features may be deprecated, removed, or replaced to improve the overall health of the project. This blog post outlines some planned changes for the v1.33 release, which the release team believes you should be aware of to ensure the continued smooth operation of your Kubernetes environment and to keep you up-to-date with the latest developments.  The information below is based on the current status of the v1.33 release and is subject to change before the final release date.&lt;/p&gt;
&lt;h2 id=&#34;the-kubernetes-api-removal-and-deprecation-process&#34;&gt;The Kubernetes API removal and deprecation process&lt;/h2&gt;
&lt;p&gt;The Kubernetes project has a well-documented &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/using-api/deprecation-policy/&#34;&gt;deprecation policy&lt;/a&gt; for features. This policy states that stable APIs may only be deprecated when a newer, stable version of that same API is available and that APIs have a minimum lifetime for each stability level. A deprecated API has been marked for removal in a future Kubernetes release. It will continue to function until removal (at least one year from the deprecation), but usage will result in a warning being displayed. Removed APIs are no longer available in the current version, at which point you must migrate to using the replacement.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Generally available (GA) or stable API versions may be marked as deprecated but must not be removed within a major version of Kubernetes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Beta or pre-release API versions must be supported for 3 releases after the deprecation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Alpha or experimental API versions may be removed in any release without prior deprecation notice; this process can become a withdrawal in cases where a different implementation for the same feature is already in place.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whether an API is removed as a result of a feature graduating from beta to stable, or because that API simply did not succeed, all removals comply with this deprecation policy. Whenever an API is removed, migration options are communicated in the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/using-api/deprecation-guide/&#34;&gt;deprecation guide&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;deprecations-and-removals-for-kubernetes-v1-33&#34;&gt;Deprecations and removals for Kubernetes v1.33&lt;/h2&gt;
&lt;h3 id=&#34;deprecation-of-the-stable-endpoints-api&#34;&gt;Deprecation of the stable Endpoints API&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/services-networking/endpoint-slices/&#34;&gt;EndpointSlices&lt;/a&gt; API has been stable since v1.21, which effectively replaced the original Endpoints API. While the original Endpoints API was simple and straightforward, it also posed some challenges when scaling to large numbers of network endpoints. The EndpointSlices API has introduced new features such as dual-stack networking, making the original Endpoints API ready for deprecation.&lt;/p&gt;
&lt;p&gt;This deprecation only impacts those who use the Endpoints API directly from workloads or scripts; these users should migrate to use EndpointSlices instead. There will be a dedicated blog post with more details on the deprecation implications and migration plans in the coming weeks.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/4974&#34;&gt;KEP-4974: Deprecate v1.Endpoints&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;removal-of-kube-proxy-version-information-in-node-status&#34;&gt;Removal of kube-proxy version information in node status&lt;/h3&gt;
&lt;p&gt;Following its deprecation in v1.31, as highlighted in the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/07/19/kubernetes-1-31-upcoming-changes/#deprecation-of-status-nodeinfo-kubeproxyversion-field-for-nodes-kep-4004-https-github-com-kubernetes-enhancements-issues-4004&#34;&gt;release announcement&lt;/a&gt;, the &lt;code&gt;status.nodeInfo.kubeProxyVersion&lt;/code&gt; field will be removed in v1.33. This field was set by kubelet, but its value was not consistently accurate. As it has been disabled by default since v1.31, the v1.33 release will remove this field entirely.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/4004&#34;&gt;KEP-4004: Deprecate status.nodeInfo.kubeProxyVersion field&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;removal-of-host-network-support-for-windows-pods&#34;&gt;Removal of host network support for Windows pods&lt;/h3&gt;
&lt;p&gt;Windows Pod networking aimed to achieve feature parity with Linux and provide better cluster density by allowing containers to use the Node’s networking namespace.
The original implementation landed as alpha with v1.26, but as it faced unexpected containerd behaviours,
and alternative solutions were available, the Kubernetes project has decided to withdraw the associated
KEP. We&#39;re expecting to see support fully removed in v1.33.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/3503&#34;&gt;KEP-3503: Host network support for Windows pods&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;featured-improvement-of-kubernetes-v1-33&#34;&gt;Featured improvement of Kubernetes v1.33&lt;/h2&gt;
&lt;p&gt;As authors of this article, we picked one improvement as the most significant change to call out!&lt;/p&gt;
&lt;h3 id=&#34;support-for-user-namespaces-within-linux-pods&#34;&gt;Support for user namespaces within Linux Pods&lt;/h3&gt;
&lt;p&gt;One of the oldest open KEPs today is &lt;a href=&#34;https://kep.k8s.io/127&#34;&gt;KEP-127&lt;/a&gt;, Pod security improvement by using Linux &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/user-namespaces/&#34;&gt;User namespaces&lt;/a&gt; for Pods. This KEP was first opened in late 2016, and after multiple iterations, had its alpha release in v1.25, initial beta in v1.30 (where it was disabled by default), and now is set to be a part of v1.33, where the feature is available by default.&lt;/p&gt;
&lt;p&gt;This support will not impact existing Pods unless you manually specify &lt;code&gt;pod.spec.hostUsers&lt;/code&gt; to opt in. As highlighted in the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/03/12/kubernetes-1-30-upcoming-changes/&#34;&gt;v1.30 sneak peek blog&lt;/a&gt;, this is an important milestone for mitigating vulnerabilities.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/127&#34;&gt;KEP-127: Support User Namespaces in pods&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;selected-other-kubernetes-v1-33-improvements&#34;&gt;Selected other Kubernetes v1.33 improvements&lt;/h2&gt;
&lt;p&gt;The following list of enhancements is likely to be included in the upcoming v1.33 release. This is not a commitment and the release content is subject to change.&lt;/p&gt;
&lt;h3 id=&#34;in-place-resource-resize-for-vertical-scaling-of-pods&#34;&gt;In-place resource resize for vertical scaling of Pods&lt;/h3&gt;
&lt;p&gt;When provisioning a Pod, you can use various resources such as Deployment, StatefulSet, etc. Scalability requirements may need horizontal scaling by updating the Pod replica count, or vertical scaling by updating resources allocated to Pod’s container(s). Before this enhancement, container resources defined in a Pod&#39;s &lt;code&gt;spec&lt;/code&gt; were immutable, and updating any of these details within a Pod template would trigger Pod replacement.&lt;/p&gt;
&lt;p&gt;But what if you could dynamically update the resource configuration for your existing Pods without restarting them?&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://kep.k8s.io/1287&#34;&gt;KEP-1287&lt;/a&gt; is precisely to allow such in-place Pod updates. It opens up various possibilities of vertical scale-up for stateful processes without any downtime, seamless scale-down when the traffic is low, and even allocating larger resources during startup that is eventually reduced once the initial setup is complete. This was released as alpha in v1.27, and is expected to land as beta in v1.33.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/1287&#34;&gt;KEP-1287: In-Place Update of Pod Resources&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;dra-s-resourceclaim-device-status-graduates-to-beta&#34;&gt;DRA’s ResourceClaim Device Status graduates to beta&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;devices&lt;/code&gt; field in ResourceClaim &lt;code&gt;status&lt;/code&gt;, originally introduced in the v1.32 release, is likely to graduate to beta in v1.33. This field allows drivers to report device status data, improving both observability and troubleshooting capabilities.&lt;/p&gt;
&lt;p&gt;For example, reporting the interface name, MAC address, and IP addresses of network interfaces in the status of a ResourceClaim can significantly help in configuring and managing network services, as well as in debugging network related issues. You can read more about ResourceClaim Device Status in &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaim-device-status&#34;&gt;Dynamic Resource Allocation: ResourceClaim Device Status&lt;/a&gt; document.&lt;/p&gt;
&lt;p&gt;Also, you can find more about the planned enhancement in &lt;a href=&#34;https://kep.k8s.io/4817&#34;&gt;KEP-4817: DRA: Resource Claim Status with possible standardized network interface data&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;ordered-namespace-deletion&#34;&gt;Ordered namespace deletion&lt;/h3&gt;
&lt;p&gt;This KEP introduces a more structured deletion process for Kubernetes namespaces to ensure secure and deterministic resource removal. The current semi-random deletion order can create security gaps or unintended behaviour, such as Pods persisting after their associated NetworkPolicies are deleted. By enforcing a structured deletion sequence that respects logical and security dependencies, this approach ensures Pods are removed before other resources. The design improves Kubernetes’s security and reliability by mitigating risks associated with non-deterministic deletions.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/5080&#34;&gt;KEP-5080: Ordered namespace deletion&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;enhancements-for-indexed-job-management&#34;&gt;Enhancements for indexed job management&lt;/h3&gt;
&lt;p&gt;These two KEPs are both set to graduate to GA to provide better reliability for job handling, specifically for indexed jobs. &lt;a href=&#34;https://kep.k8s.io/3850&#34;&gt;KEP-3850&lt;/a&gt; provides per-index backoff limits for indexed jobs, which allows each index to be fully independent of other indexes. Also, &lt;a href=&#34;https://kep.k8s.io/3998&#34;&gt;KEP-3998&lt;/a&gt; extends Job API to define conditions for making an indexed job as successfully completed when not all indexes are succeeded.&lt;/p&gt;
&lt;p&gt;You can find more in &lt;a href=&#34;https://kep.k8s.io/3850&#34;&gt;KEP-3850: Backoff Limit Per Index For Indexed Jobs&lt;/a&gt; and &lt;a href=&#34;https://kep.k8s.io/3998&#34;&gt;KEP-3998: Job success/completion policy&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;want-to-know-more&#34;&gt;Want to know more?&lt;/h2&gt;
&lt;p&gt;New features and deprecations are also announced in the Kubernetes release notes. We will formally announce what&#39;s new in &lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.33.md&#34;&gt;Kubernetes v1.33&lt;/a&gt; as part of the CHANGELOG for that release.&lt;/p&gt;
&lt;p&gt;Kubernetes v1.33 release is planned for &lt;strong&gt;Wednesday, 23rd April, 2025&lt;/strong&gt;. Stay tuned for updates!&lt;/p&gt;
&lt;p&gt;You can also see the announcements of changes in the release notes for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.32.md&#34;&gt;Kubernetes v1.32&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md&#34;&gt;Kubernetes v1.31&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md&#34;&gt;Kubernetes v1.30&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;Get involved&lt;/h2&gt;
&lt;p&gt;The simplest way to get involved with Kubernetes is by joining one of the many &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-list.md&#34;&gt;Special Interest Groups&lt;/a&gt; (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/communication&#34;&gt;community meeting&lt;/a&gt;, and through the channels below. Thank you for your continued feedback and support.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Follow us on Bluesky &lt;a href=&#34;https://bsky.app/profile/kubernetes.io&#34;&gt;@kubernetes.io&lt;/a&gt; for the latest updates&lt;/li&gt;
&lt;li&gt;Join the community discussion on &lt;a href=&#34;https://discuss.kubernetes.io/&#34;&gt;Discuss&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Join the community on &lt;a href=&#34;http://slack.k8s.io/&#34;&gt;Slack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post questions (or answer questions) on &lt;a href=&#34;https://serverfault.com/questions/tagged/kubernetes&#34;&gt;Server Fault&lt;/a&gt; or &lt;a href=&#34;http://stackoverflow.com/questions/tagged/kubernetes&#34;&gt;Stack Overflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Share your Kubernetes &lt;a href=&#34;https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform&#34;&gt;story&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Read more about what’s happening with Kubernetes on the &lt;a href=&#34;https://kubernetes.io/blog/&#34;&gt;blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Learn more about the &lt;a href=&#34;https://github.com/kubernetes/sig-release/tree/master/release-team&#34;&gt;Kubernetes Release Team&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Fresh Swap Features for Linux Users in Kubernetes 1.32</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/25/swap-linux-improvements/</link>
      <pubDate>Tue, 25 Mar 2025 10:00:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/25/swap-linux-improvements/</guid>
      <description>
        
        
        &lt;p&gt;Swap is a fundamental and an invaluable Linux feature.
It offers numerous benefits, such as effectively increasing a node’s memory by
swapping out unused data,
shielding nodes from system-level memory spikes,
preventing Pods from crashing when they hit their memory limits,
and &lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#user-stories&#34;&gt;much more&lt;/a&gt;.
As a result, the node special interest group within the Kubernetes project
has invested significant effort into supporting swap on Linux nodes.&lt;/p&gt;
&lt;p&gt;The 1.22 release &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2021/08/09/run-nodes-with-swap-alpha/&#34;&gt;introduced&lt;/a&gt; Alpha support
for configuring swap memory usage for Kubernetes workloads running on Linux on a per-node basis.
Later, in release 1.28, support for swap on Linux nodes has graduated to Beta, along with many
new improvements.
In the following Kubernetes releases more improvements were made, paving the way
to GA in the near future.&lt;/p&gt;
&lt;p&gt;Prior to version 1.22, Kubernetes did not provide support for swap memory on Linux systems.
This was due to the inherent difficulty in guaranteeing and accounting for pod memory utilization
when swap memory was involved. As a result, swap support was deemed out of scope in the initial
design of Kubernetes, and the default behavior of a kubelet was to fail to start if swap memory
was detected on a node.&lt;/p&gt;
&lt;p&gt;In version 1.22, the swap feature for Linux was initially introduced in its Alpha stage.
This provided Linux users the opportunity to experiment with the swap feature for the first time.
However, as an Alpha version, it was not fully developed and only partially worked on limited environments.&lt;/p&gt;
&lt;p&gt;In version 1.28 swap support on Linux nodes was promoted to Beta.
The Beta version was a drastic leap forward.
Not only did it fix a large amount of bugs and made swap work in a stable way,
but it also brought cgroup v2 support, introduced a wide variety of tests
which include complex scenarios such as node-level pressure, and more.
It also brought many exciting new capabilities such as the &lt;code&gt;LimitedSwap&lt;/code&gt; behavior
which sets an auto-calculated swap limit to containers, OpenMetrics instrumentation
support (through the &lt;code&gt;/metrics/resource&lt;/code&gt; endpoint) and Summary API for
VerticalPodAutoscalers (through the &lt;code&gt;/stats/summary&lt;/code&gt; endpoint), and more.&lt;/p&gt;
&lt;p&gt;Today we are working on more improvements, paving the way for GA.
Currently, the focus is especially towards ensuring node stability,
enhanced debug abilities, addressing user feedback,
polishing the feature and making it stable.
For example, in order to increase stability, containers in high-priority pods
cannot access swap which ensures the memory they need is ready to use.
In addition, the &lt;code&gt;UnlimitedSwap&lt;/code&gt; behavior was removed since it might compromise
the node&#39;s health.
Secret content protection against swapping has also been introduced
(see relevant &lt;a href=&#34;#memory-backed-volumes&#34;&gt;security-risk section&lt;/a&gt; for more info).&lt;/p&gt;
&lt;p&gt;To conclude, compared to previous releases, the kubelet&#39;s support for running with swap enabled
is more stable and robust, more user-friendly, and addresses many known shortcomings.
That said, the NodeSwap feature introduces basic swap support, and this is just the beginning.
In the near future, additional features are planned to enhance swap functionality in various ways,
such as improving evictions, extending the API, increasing customizability, and more!&lt;/p&gt;
&lt;h2 id=&#34;how-do-i-use-it&#34;&gt;How do I use it?&lt;/h2&gt;
&lt;p&gt;In order for the kubelet to initialize on a swap-enabled node, the &lt;code&gt;failSwapOn&lt;/code&gt;
field must be set to &lt;code&gt;false&lt;/code&gt; on kubelet&#39;s configuration setting, or the deprecated
&lt;code&gt;--fail-swap-on&lt;/code&gt; command line flag must be deactivated.&lt;/p&gt;
&lt;p&gt;It is possible to configure the &lt;code&gt;memorySwap.swapBehavior&lt;/code&gt; option to define the
manner in which a node utilizes swap memory.
For instance,&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# this fragment goes into the kubelet&amp;#39;s configuration file&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;memorySwap&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;swapBehavior&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;LimitedSwap&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The currently available configuration options for &lt;code&gt;swapBehavior&lt;/code&gt; are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;NoSwap&lt;/code&gt; (default): Kubernetes workloads cannot use swap. However, processes
outside of Kubernetes&#39; scope, like system daemons (such as kubelet itself!) can utilize swap.
This behavior is beneficial for protecting the node from system-level memory spikes,
but it does not safeguard the workloads themselves from such spikes.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LimitedSwap&lt;/code&gt;: Kubernetes workloads can utilize swap memory, but with certain limitations.
The amount of swap available to a Pod is determined automatically,
based on the proportion of the memory requested relative to the node&#39;s total memory.
Only non-high-priority Pods under the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/pod-qos/#burstable&#34;&gt;Burstable&lt;/a&gt;
Quality of Service (QoS) tier are permitted to use swap.
For more details, see the &lt;a href=&#34;#how-is-the-swap-limit-being-determined-with-limitedswap&#34;&gt;section below&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If configuration for &lt;code&gt;memorySwap&lt;/code&gt; is not specified,
by default the kubelet will apply the same behaviour as the &lt;code&gt;NoSwap&lt;/code&gt; setting.&lt;/p&gt;
&lt;p&gt;On Linux nodes, Kubernetes only supports running with swap enabled for hosts that use cgroup v2.
On cgroup v1 systems, all Kubernetes workloads are not allowed to use swap memory.&lt;/p&gt;
&lt;h2 id=&#34;install-a-swap-enabled-cluster-with-kubeadm&#34;&gt;Install a swap-enabled cluster with kubeadm&lt;/h2&gt;
&lt;h3 id=&#34;before-you-begin&#34;&gt;Before you begin&lt;/h3&gt;
&lt;p&gt;It is required for this demo that the kubeadm tool be installed, following the steps outlined in the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/&#34;&gt;kubeadm installation guide&lt;/a&gt;.
If swap is already enabled on the node, cluster creation may proceed.
If swap is not enabled, please refer to the provided instructions for enabling swap.&lt;/p&gt;
&lt;h3 id=&#34;create-a-swap-file-and-turn-swap-on&#34;&gt;Create a swap file and turn swap on&lt;/h3&gt;
&lt;p&gt;I&#39;ll demonstrate creating 4GiB of swap, both in the encrypted and unencrypted case.&lt;/p&gt;
&lt;h4 id=&#34;setting-up-unencrypted-swap&#34;&gt;Setting up unencrypted swap&lt;/h4&gt;
&lt;p&gt;An unencrypted swap file can be set up as follows.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Allocate storage and restrict access&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;fallocate --length 4GiB /swapfile
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;chmod &lt;span style=&#34;color:#666&#34;&gt;600&lt;/span&gt; /swapfile
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Format the swap space&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;mkswap /swapfile
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Activate the swap space for paging&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;swapon /swapfile
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id=&#34;setting-up-encrypted-swap&#34;&gt;Setting up encrypted swap&lt;/h4&gt;
&lt;p&gt;An encrypted swap file can be set up as follows.
Bear in mind that this example uses the &lt;code&gt;cryptsetup&lt;/code&gt; binary (which is available
on most Linux distributions).&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Allocate storage and restrict access&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;fallocate --length 4GiB /swapfile
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;chmod &lt;span style=&#34;color:#666&#34;&gt;600&lt;/span&gt; /swapfile
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Create an encrypted device backed by the allocated storage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cryptsetup --type plain --cipher aes-xts-plain64 --key-size &lt;span style=&#34;color:#666&#34;&gt;256&lt;/span&gt; -d /dev/urandom open /swapfile cryptswap
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Format the swap space&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;mkswap /dev/mapper/cryptswap
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Activate the swap space for paging&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;swapon /dev/mapper/cryptswap
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id=&#34;verify-that-swap-is-enabled&#34;&gt;Verify that swap is enabled&lt;/h4&gt;
&lt;p&gt;Swap can be verified to be enabled with both &lt;code&gt;swapon -s&lt;/code&gt; command or the &lt;code&gt;free&lt;/code&gt; command&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;gt; swapon -s
Filename				Type		Size		Used		Priority
/dev/dm-0                               partition	4194300		0		-2
&lt;/code&gt;&lt;/pre&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;gt; free -h
               total        used        free      shared  buff/cache   available
Mem:           3.8Gi       1.3Gi       249Mi        25Mi       2.5Gi       2.5Gi
Swap:          4.0Gi          0B       4.0Gi
&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&#34;enable-swap-on-boot&#34;&gt;Enable swap on boot&lt;/h4&gt;
&lt;p&gt;After setting up swap, to start the swap file at boot time,
you either set up a systemd unit to activate (encrypted) swap, or you
add a line similar to &lt;code&gt;/swapfile swap swap defaults 0 0&lt;/code&gt; into &lt;code&gt;/etc/fstab&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;set-up-a-kubernetes-cluster-that-uses-swap-enabled-nodes&#34;&gt;Set up a Kubernetes cluster that uses swap-enabled nodes&lt;/h3&gt;
&lt;p&gt;To make things clearer, here is an example kubeadm configuration file &lt;code&gt;kubeadm-config.yaml&lt;/code&gt; for the swap enabled cluster.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;kubeadm.k8s.io/v1beta3&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;InitConfiguration&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kubelet.config.k8s.io/v1beta1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;KubeletConfiguration&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;failSwapOn&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;false&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;memorySwap&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;swapBehavior&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;LimitedSwap&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then create a single-node cluster using &lt;code&gt;kubeadm init --config kubeadm-config.yaml&lt;/code&gt;.
During init, there is a warning that swap is enabled on the node and in case the kubelet
&lt;code&gt;failSwapOn&lt;/code&gt; is set to true. We plan to remove this warning in a future release.&lt;/p&gt;
&lt;h2 id=&#34;how-is-the-swap-limit-being-determined-with-limitedswap&#34;&gt;How is the swap limit being determined with LimitedSwap?&lt;/h2&gt;
&lt;p&gt;The configuration of swap memory, including its limitations, presents a significant
challenge. Not only is it prone to misconfiguration, but as a system-level property, any
misconfiguration could potentially compromise the entire node rather than just a specific
workload. To mitigate this risk and ensure the health of the node, we have implemented
Swap with automatic configuration of limitations.&lt;/p&gt;
&lt;p&gt;With &lt;code&gt;LimitedSwap&lt;/code&gt;, Pods that do not fall under the Burstable QoS classification (i.e.
&lt;code&gt;BestEffort&lt;/code&gt;/&lt;code&gt;Guaranteed&lt;/code&gt; QoS Pods) are prohibited from utilizing swap memory.
&lt;code&gt;BestEffort&lt;/code&gt; QoS Pods exhibit unpredictable memory consumption patterns and lack
information regarding their memory usage, making it difficult to determine a safe
allocation of swap memory.
Conversely, &lt;code&gt;Guaranteed&lt;/code&gt; QoS Pods are typically employed for applications that rely on the
precise allocation of resources specified by the workload, with memory being immediately available.
To maintain the aforementioned security and node health guarantees,
these Pods are not permitted to use swap memory when &lt;code&gt;LimitedSwap&lt;/code&gt; is in effect.
In addition, high-priority pods are not permitted to use swap in order to ensure the memory
they consume always residents on disk, hence ready to use.&lt;/p&gt;
&lt;p&gt;Prior to detailing the calculation of the swap limit, it is necessary to define the following terms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;nodeTotalMemory&lt;/code&gt;: The total amount of physical memory available on the node.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;totalPodsSwapAvailable&lt;/code&gt;: The total amount of swap memory on the node that is available for use by Pods (some swap memory may be reserved for system use).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;containerMemoryRequest&lt;/code&gt;: The container&#39;s memory request.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Swap limitation is configured as:
&lt;code&gt;(containerMemoryRequest / nodeTotalMemory) × totalPodsSwapAvailable&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;In other words, the amount of swap that a container is able to use is proportionate to its
memory request, the node&#39;s total physical memory and the total amount of swap memory on
the node that is available for use by Pods.&lt;/p&gt;
&lt;p&gt;It is important to note that, for containers within Burstable QoS Pods, it is possible to
opt-out of swap usage by specifying memory requests that are equal to memory limits.
Containers configured in this manner will not have access to swap memory.&lt;/p&gt;
&lt;h2 id=&#34;how-does-it-work&#34;&gt;How does it work?&lt;/h2&gt;
&lt;p&gt;There are a number of possible ways that one could envision swap use on a node.
When swap is already provisioned and available on a node,
the kubelet is able to be configured so that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It can start with swap on.&lt;/li&gt;
&lt;li&gt;It will direct the Container Runtime Interface to allocate zero swap memory
to Kubernetes workloads by default.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Swap configuration on a node is exposed to a cluster admin via the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/config-api/kubelet-config.v1/&#34;&gt;&lt;code&gt;memorySwap&lt;/code&gt; in the KubeletConfiguration&lt;/a&gt;.
As a cluster administrator, you can specify the node&#39;s behaviour in the
presence of swap memory by setting &lt;code&gt;memorySwap.swapBehavior&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The kubelet employs the &lt;a href=&#34;https://kubernetes.io/docs/concepts/architecture/cri/&#34;&gt;CRI&lt;/a&gt;
(container runtime interface) API, and directs the container runtime to
configure specific cgroup v2 parameters (such as &lt;code&gt;memory.swap.max&lt;/code&gt;) in a manner that will
enable the desired swap configuration for a container. For runtimes that use control groups,
the container runtime is then responsible for writing these settings to the container-level cgroup.&lt;/p&gt;
&lt;h2 id=&#34;how-can-i-monitor-swap&#34;&gt;How can I monitor swap?&lt;/h2&gt;
&lt;h3 id=&#34;node-and-container-level-metric-statistics&#34;&gt;Node and container level metric statistics&lt;/h3&gt;
&lt;p&gt;Kubelet now collects node and container level metric statistics,
which can be accessed at the &lt;code&gt;/metrics/resource&lt;/code&gt; (which is used mainly by monitoring
tools like Prometheus) and &lt;code&gt;/stats/summary&lt;/code&gt; (which is used mainly by Autoscalers) kubelet HTTP endpoints.
This allows clients who can directly interrogate the kubelet to
monitor swap usage and remaining swap memory when using &lt;code&gt;LimitedSwap&lt;/code&gt;.
Additionally, a &lt;code&gt;machine_swap_bytes&lt;/code&gt; metric has been added to cadvisor to show
the total physical swap capacity of the machine.
See &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/instrumentation/node-metrics/&#34;&gt;this page&lt;/a&gt; for more info.&lt;/p&gt;
&lt;h3 id=&#34;node-feature-discovery&#34;&gt;Node Feature Discovery (NFD)&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes-sigs/node-feature-discovery&#34;&gt;Node Feature Discovery&lt;/a&gt;
is a Kubernetes addon for detecting hardware features and configuration.
It can be utilized to discover which nodes are provisioned with swap.&lt;/p&gt;
&lt;p&gt;As an example, to figure out which nodes are provisioned with swap,
use the following command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl get nodes -o &lt;span style=&#34;color:#b8860b&#34;&gt;jsonpath&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{range .items[?(@.metadata.labels.feature\.node\.kubernetes\.io/memory-swap)]}{.metadata.name}{&amp;#34;\t&amp;#34;}{.metadata.labels.feature\.node\.kubernetes\.io/memory-swap}{&amp;#34;\n&amp;#34;}{end}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will result in an output similar to:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;k8s-worker1: true
k8s-worker2: true
k8s-worker3: false
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In this example, swap is provisioned on nodes &lt;code&gt;k8s-worker1&lt;/code&gt; and &lt;code&gt;k8s-worker2&lt;/code&gt;, but not on &lt;code&gt;k8s-worker3&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;caveats&#34;&gt;Caveats&lt;/h2&gt;
&lt;p&gt;Having swap available on a system reduces predictability.
While swap can enhance performance by making more RAM available, swapping data
back to memory is a heavy operation, sometimes slower by many orders of magnitude,
which can cause unexpected performance regressions.
Furthermore, swap changes a system&#39;s behaviour under memory pressure.
Enabling swap increases the risk of noisy neighbors,
where Pods that frequently use their RAM may cause other Pods to swap.
In addition, since swap allows for greater memory usage for workloads in Kubernetes that cannot be predictably accounted for,
and due to unexpected packing configurations,
the scheduler currently does not account for swap memory usage.
This heightens the risk of noisy neighbors.&lt;/p&gt;
&lt;p&gt;The performance of a node with swap memory enabled depends on the underlying physical storage.
When swap memory is in use, performance will be significantly worse in an I/O
operations per second (IOPS) constrained environment, such as a cloud VM with
I/O throttling, when compared to faster storage mediums like solid-state drives
or NVMe.
As swap might cause IO pressure, it is recommended to give a higher IO latency
priority to system critical daemons. See the relevant section in the
&lt;a href=&#34;#good-practice-for-using-swap-in-a-kubernetes-cluster&#34;&gt;recommended practices&lt;/a&gt; section below.&lt;/p&gt;
&lt;h3 id=&#34;memory-backed-volumes&#34;&gt;Memory-backed volumes&lt;/h3&gt;
&lt;p&gt;On Linux nodes, memory-backed volumes (such as &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/configuration/secret/&#34;&gt;&lt;code&gt;secret&lt;/code&gt;&lt;/a&gt;
volume mounts, or &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/volumes/#emptydir&#34;&gt;&lt;code&gt;emptyDir&lt;/code&gt;&lt;/a&gt; with &lt;code&gt;medium: Memory&lt;/code&gt;)
are implemented with a &lt;code&gt;tmpfs&lt;/code&gt; filesystem.
The contents of such volumes should remain in memory at all times, hence should
not be swapped to disk.
To ensure the contents of such volumes remain in memory, the &lt;code&gt;noswap&lt;/code&gt; tmpfs option
is being used.&lt;/p&gt;
&lt;p&gt;The Linux kernel officially supports the &lt;code&gt;noswap&lt;/code&gt; option from version 6.3 (more info
can be found in &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/node/kernel-version-requirements/#requirements-other&#34;&gt;Linux Kernel Version Requirements&lt;/a&gt;).
However, the different distributions often choose to backport this mount option to older
Linux versions as well.&lt;/p&gt;
&lt;p&gt;In order to verify whether the node supports the &lt;code&gt;noswap&lt;/code&gt; option, the kubelet will do the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the kernel&#39;s version is above 6.3 then the &lt;code&gt;noswap&lt;/code&gt; option will be assumed to be supported.&lt;/li&gt;
&lt;li&gt;Otherwise, kubelet would try to mount a dummy tmpfs with the &lt;code&gt;noswap&lt;/code&gt; option at startup.
If kubelet fails with an error indicating of an unknown option, &lt;code&gt;noswap&lt;/code&gt; will be assumed
to not be supported, hence will not be used.
A kubelet log entry will be emitted to warn the user about memory-backed volumes might swap to disk.
If kubelet succeeds, the dummy tmpfs will be deleted and the &lt;code&gt;noswap&lt;/code&gt; option will be used.
&lt;ul&gt;
&lt;li&gt;If the &lt;code&gt;noswap&lt;/code&gt; option is not supported, kubelet will emit a warning log entry,
then continue its execution.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is deeply encouraged to encrypt the swap space.
See the &lt;a href=&#34;#setting-up-encrypted-swap&#34;&gt;section above&lt;/a&gt; with an example for setting unencrypted swap.
However, handling encrypted swap is not within the scope of kubelet;
rather, it is a general OS configuration concern and should be addressed at that level.
It is the administrator&#39;s responsibility to provision encrypted swap to mitigate this risk.&lt;/p&gt;
&lt;h2 id=&#34;good-practice-for-using-swap-in-a-kubernetes-cluster&#34;&gt;Good practice for using swap in a Kubernetes cluster&lt;/h2&gt;
&lt;h3 id=&#34;disable-swap-for-system-critical-daemons&#34;&gt;Disable swap for system-critical daemons&lt;/h3&gt;
&lt;p&gt;During the testing phase and based on user feedback, it was observed that the performance
of system-critical daemons and services might degrade.
This implies that system daemons, including the kubelet, could operate slower than usual.
If this issue is encountered, it is advisable to configure the cgroup of the system slice
to prevent swapping (i.e., set &lt;code&gt;memory.swap.max=0&lt;/code&gt;).&lt;/p&gt;
&lt;h3 id=&#34;protect-system-critical-daemons-for-i-o-latency&#34;&gt;Protect system-critical daemons for I/O latency&lt;/h3&gt;
&lt;p&gt;Swap can increase the I/O load on a node.
When memory pressure causes the kernel to rapidly swap pages in and out,
system-critical daemons and services that rely on I/O operations may
experience performance degradation.&lt;/p&gt;
&lt;p&gt;To mitigate this, it is recommended for systemd users to prioritize the system slice in terms of I/O latency.
For non-systemd users,
setting up a dedicated cgroup for system daemons and processes and prioritizing I/O latency in the same way is advised.
This can be achieved by setting &lt;code&gt;io.latency&lt;/code&gt; for the system slice,
thereby granting it higher I/O priority.
See &lt;a href=&#34;https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst&#34;&gt;cgroup&#39;s documentation&lt;/a&gt; for more info.&lt;/p&gt;
&lt;h3 id=&#34;swap-and-control-plane-nodes&#34;&gt;Swap and control plane nodes&lt;/h3&gt;
&lt;p&gt;The Kubernetes project recommends running control plane nodes without any swap space configured.
The control plane primarily hosts Guaranteed QoS Pods, so swap can generally be disabled.
The main concern is that swapping critical services on the control plane could negatively impact performance.&lt;/p&gt;
&lt;h3 id=&#34;use-of-a-dedicated-disk-for-swap&#34;&gt;Use of a dedicated disk for swap&lt;/h3&gt;
&lt;p&gt;It is recommended to use a separate, encrypted disk for the swap partition.
If swap resides on a partition or the root filesystem, workloads may interfere
with system processes that need to write to disk.
When they share the same disk, processes can overwhelm swap,
disrupting the I/O of kubelet, container runtime, and systemd, which would impact other workloads.
Since swap space is located on a disk, it is crucial to ensure the disk is fast enough for the intended use cases.
Alternatively, one can configure I/O priorities between different mapped areas of a single backing device.&lt;/p&gt;
&lt;h2 id=&#34;looking-ahead&#34;&gt;Looking ahead&lt;/h2&gt;
&lt;p&gt;As you can see, the swap feature was dramatically improved lately,
paving the way for a feature GA.
However, this is just the beginning.
It&#39;s a foundational implementation marking the beginning of enhanced swap functionality.&lt;/p&gt;
&lt;p&gt;In the near future, additional features are planned to further improve swap capabilities,
including better eviction mechanisms, extended API support, increased customizability,
better debug abilities and more!&lt;/p&gt;
&lt;h2 id=&#34;how-can-i-learn-more&#34;&gt;How can I learn more?&lt;/h2&gt;
&lt;p&gt;You can review the current &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/architecture/nodes/#swap-memory&#34;&gt;documentation&lt;/a&gt;
for using swap with Kubernetes.&lt;/p&gt;
&lt;p&gt;For more information, please see &lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4128&#34;&gt;KEP-2400&lt;/a&gt; and its
&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md&#34;&gt;design proposal&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;how-do-i-get-involved&#34;&gt;How do I get involved?&lt;/h2&gt;
&lt;p&gt;Your feedback is always welcome! SIG Node &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-node#meetings&#34;&gt;meets regularly&lt;/a&gt;
and &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-node#contact&#34;&gt;can be reached&lt;/a&gt;
via &lt;a href=&#34;https://slack.k8s.io/&#34;&gt;Slack&lt;/a&gt; (channel &lt;strong&gt;#sig-node&lt;/strong&gt;), or the SIG&#39;s
&lt;a href=&#34;https://groups.google.com/forum/#!forum/kubernetes-sig-node&#34;&gt;mailing list&lt;/a&gt;. A Slack
channel dedicated to swap is also available at &lt;strong&gt;#sig-node-swap&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Feel free to reach out to me, Itamar Holder (&lt;strong&gt;@iholder101&lt;/strong&gt; on Slack and GitHub)
if you&#39;d like to help or ask further questions.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Ingress-nginx CVE-2025-1974: What You Need to Know</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/24/ingress-nginx-cve-2025-1974/</link>
      <pubDate>Mon, 24 Mar 2025 12:00:00 -0800</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/24/ingress-nginx-cve-2025-1974/</guid>
      <description>
        
        
        &lt;p&gt;Today, the ingress-nginx maintainers have released patches for a batch of critical vulnerabilities that could make it easy for attackers to take over your Kubernetes cluster: &lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.12.1&#34;&gt;ingress-nginx v1.12.1&lt;/a&gt; and &lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.11.5&#34;&gt;ingress-nginx v1.11.5&lt;/a&gt;. If you are among the over 40% of Kubernetes administrators using &lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/&#34;&gt;ingress-nginx&lt;/a&gt;, you should take action immediately to protect your users and data.&lt;/p&gt;
&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/services-networking/ingress/&#34;&gt;Ingress&lt;/a&gt; is the traditional Kubernetes feature for exposing your workload Pods to the world so that they can be useful. In an implementation-agnostic way, Kubernetes users can define how their applications should be made available on the network. Then, an &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/services-networking/ingress-controllers/&#34;&gt;ingress controller&lt;/a&gt; uses that definition to set up local or cloud resources as required for the user’s particular situation and needs.&lt;/p&gt;
&lt;p&gt;Many different ingress controllers are available, to suit users of different cloud providers or brands of load balancers. Ingress-nginx is a software-only ingress controller provided by the Kubernetes project. Because of its versatility and ease of use, ingress-nginx is quite popular: it is deployed in over 40% of Kubernetes clusters!&lt;/p&gt;
&lt;p&gt;Ingress-nginx translates the requirements from Ingress objects into configuration for nginx, a powerful open source webserver daemon. Then, nginx uses that configuration to accept and route requests to the various applications running within a Kubernetes cluster. Proper handling of these nginx configuration parameters is crucial, because ingress-nginx needs to allow users significant flexibility while preventing them from accidentally or intentionally tricking nginx into doing things it shouldn’t.&lt;/p&gt;
&lt;h2 id=&#34;vulnerabilities-patched-today&#34;&gt;Vulnerabilities Patched Today&lt;/h2&gt;
&lt;p&gt;Four of today’s ingress-nginx vulnerabilities are improvements to how ingress-nginx handles particular bits of nginx config. Without these fixes, a specially-crafted Ingress object can cause nginx to misbehave in various ways, including revealing the values of &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/configuration/secret/&#34;&gt;Secrets&lt;/a&gt; that are accessible to ingress-nginx. By default, ingress-nginx has access to all Secrets cluster-wide, so this can often lead to complete cluster takeover by any user or entity that has permission to create an Ingress.&lt;/p&gt;
&lt;p&gt;The most serious of today’s vulnerabilities, &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131009&#34;&gt;CVE-2025-1974&lt;/a&gt;, rated &lt;a href=&#34;https://www.first.org/cvss/calculator/3-1#CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H&#34;&gt;9.8 CVSS&lt;/a&gt;, allows anything on the Pod network to exploit configuration injection vulnerabilities via the Validating Admission Controller feature of ingress-nginx. This makes such vulnerabilities far more dangerous: ordinarily one would need to be able to create an Ingress object in the cluster, which is a fairly privileged action. When combined with today’s other vulnerabilities, &lt;strong&gt;CVE-2025-1974 means that anything on the Pod network has a good chance of taking over your Kubernetes cluster, with no credentials or administrative access required&lt;/strong&gt;. In many common scenarios, the Pod network is accessible to all workloads in your cloud VPC, or even anyone connected to your corporate network! This is a very serious situation.&lt;/p&gt;
&lt;p&gt;Today, we have released &lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.12.1&#34;&gt;ingress-nginx v1.12.1&lt;/a&gt; and &lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.11.5&#34;&gt;ingress-nginx v1.11.5&lt;/a&gt;, which have fixes for all five of these vulnerabilities.&lt;/p&gt;
&lt;h2 id=&#34;your-next-steps&#34;&gt;Your next steps&lt;/h2&gt;
&lt;p&gt;First, determine if your clusters are using ingress-nginx. In most cases, you can check this by running &lt;code&gt;kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx&lt;/code&gt; with cluster administrator permissions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If you are using ingress-nginx, make a plan to remediate these vulnerabilities immediately.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The best and easiest remedy is to &lt;a href=&#34;https://kubernetes.github.io/ingress-nginx/deploy/upgrade/&#34;&gt;upgrade to the new patch release of ingress-nginx&lt;/a&gt;.&lt;/strong&gt; All five of today’s vulnerabilities are fixed by installing today’s patches.&lt;/p&gt;
&lt;p&gt;If you can’t upgrade right away, you can significantly reduce your risk by turning off the Validating Admission Controller feature of ingress-nginx.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you have installed ingress-nginx using Helm
&lt;ul&gt;
&lt;li&gt;Reinstall, setting the Helm value &lt;code&gt;controller.admissionWebhooks.enabled=false&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;If you have installed ingress-nginx manually
&lt;ul&gt;
&lt;li&gt;delete the ValidatingWebhookconfiguration called &lt;code&gt;ingress-nginx-admission&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;edit the &lt;code&gt;ingress-nginx-controller&lt;/code&gt; Deployment or Daemonset, removing &lt;code&gt;--validating-webhook&lt;/code&gt; from the controller container’s argument list&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you turn off the Validating Admission Controller feature as a mitigation for CVE-2025-1974, remember to turn it back on after you upgrade. This feature provides important quality of life improvements for your users, warning them about incorrect Ingress configurations before they can take effect.&lt;/p&gt;
&lt;h2 id=&#34;conclusion-thanks-and-further-reading&#34;&gt;Conclusion, thanks, and further reading&lt;/h2&gt;
&lt;p&gt;The ingress-nginx vulnerabilities announced today, including CVE-2025-1974, present a serious risk to many Kubernetes users and their data. If you use ingress-nginx, you should take action immediately to keep yourself safe.&lt;/p&gt;
&lt;p&gt;Thanks go out to Nir Ohfeld, Sagi Tzadik, Ronen Shustin, and Hillai Ben-Sasson from Wiz for responsibly disclosing these vulnerabilities, and for working with the Kubernetes SRC members and ingress-nginx maintainers (Marco Ebert and James Strong) to ensure we fixed them effectively.&lt;/p&gt;
&lt;p&gt;For further information about the maintenance and future of ingress-nginx, please see this &lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/issues/13002&#34;&gt;GitHub issue&lt;/a&gt; and/or attend &lt;a href=&#34;https://kccnceu2025.sched.com/event/1tcyc/&#34;&gt;James and Marco’s KubeCon/CloudNativeCon EU 2025 presentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further information about the specific vulnerabilities discussed in this article, please see the appropriate GitHub issue: &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131005&#34;&gt;CVE-2025-24513&lt;/a&gt;, &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131006&#34;&gt;CVE-2025-24514&lt;/a&gt;, &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131007&#34;&gt;CVE-2025-1097&lt;/a&gt;, &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131008&#34;&gt;CVE-2025-1098&lt;/a&gt;, or &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131009&#34;&gt;CVE-2025-1974&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This blog post was revised in May 2025 to update the hyperlinks.&lt;/em&gt;&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Introducing JobSet</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/23/introducing-jobset/</link>
      <pubDate>Sun, 23 Mar 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/23/introducing-jobset/</guid>
      <description>
        
        
        &lt;p&gt;&lt;strong&gt;Authors&lt;/strong&gt;: Daniel Vega-Myhre (Google), Abdullah Gharaibeh (Google), Kevin Hannon (Red Hat)&lt;/p&gt;
&lt;p&gt;In this article, we introduce &lt;a href=&#34;https://jobset.sigs.k8s.io/&#34;&gt;JobSet&lt;/a&gt;, an open source API for
representing distributed jobs. The goal of JobSet is to provide a unified API for distributed ML
training and HPC workloads on Kubernetes.&lt;/p&gt;
&lt;h2 id=&#34;why-jobset&#34;&gt;Why JobSet?&lt;/h2&gt;
&lt;p&gt;The Kubernetes community’s recent enhancements to the batch ecosystem on Kubernetes has attracted ML
engineers who have found it to be a natural fit for the requirements of running distributed training
workloads.&lt;/p&gt;
&lt;p&gt;Large ML models (particularly LLMs) which cannot fit into the memory of the GPU or TPU chips on a
single host are often distributed across tens of thousands of accelerator chips, which in turn may
span thousands of hosts.&lt;/p&gt;
&lt;p&gt;As such, the model training code is often containerized and executed simultaneously on all these
hosts, performing distributed computations which often shard both the model parameters and/or the
training dataset across the target accelerator chips, using communication collective primitives like
all-gather and all-reduce to perform distributed computations and synchronize gradients between
hosts.&lt;/p&gt;
&lt;p&gt;These workload characteristics make Kubernetes a great fit for this type of workload, as efficiently
scheduling and managing the lifecycle of containerized applications across a cluster of compute
resources is an area where it shines.&lt;/p&gt;
&lt;p&gt;It is also very extensible, allowing developers to define their own Kubernetes APIs, objects, and
controllers which manage the behavior and life cycle of these objects, allowing engineers to develop
custom distributed training orchestration solutions to fit their needs.&lt;/p&gt;
&lt;p&gt;However, as distributed ML training techniques continue to evolve, existing Kubernetes primitives do
not adequately model them alone anymore.&lt;/p&gt;
&lt;p&gt;Furthermore, the landscape of Kubernetes distributed training orchestration APIs has become
fragmented, and each of the existing solutions in this fragmented landscape has certain limitations
that make it non-optimal for distributed ML training.&lt;/p&gt;
&lt;p&gt;For example, the KubeFlow training operator defines custom APIs for different frameworks (e.g.
PyTorchJob, TFJob, MPIJob, etc.); however, each of these job types are in fact a solution fit
specifically to the target framework, each with different semantics and behavior.&lt;/p&gt;
&lt;p&gt;On the other hand, the Job API fixed many gaps for running batch workloads, including Indexed
completion mode, higher scalability, Pod failure policies and Pod backoff policy to mention a few of
the most recent enhancements. However, running ML training and HPC workloads using the upstream Job
API requires extra orchestration to fill the following gaps:&lt;/p&gt;
&lt;p&gt;Multi-template Pods : Most HPC or ML training jobs include more than one type of Pods. The different
Pods are part of the same workload, but they need to run a different container, request different
resources or have different failure policies. A common example is the driver-worker pattern.&lt;/p&gt;
&lt;p&gt;Job groups : Large scale training workloads span multiple network topologies, running across
multiple racks for example. Such workloads are network latency sensitive, and aim to localize
communication and minimize traffic crossing the higher-latency network links. To facilitate this,
the workload needs to be split into groups of Pods each assigned to a network topology.&lt;/p&gt;
&lt;p&gt;Inter-Pod communication : Create and manage the resources (e.g. &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/services-networking/service/#headless-services&#34;&gt;headless
Services&lt;/a&gt;) necessary to establish
communication between the Pods of a job.&lt;/p&gt;
&lt;p&gt;Startup sequencing : Some jobs require a specific start sequence of pods; sometimes the driver is
expected to start first (like Ray or Spark), in other cases the workers are expected to be ready
before starting the driver (like MPI).&lt;/p&gt;
&lt;p&gt;JobSet aims to address those gaps using the Job API as a building block to build a richer API for
large-scale distributed HPC and ML use cases.&lt;/p&gt;
&lt;h2 id=&#34;how-jobset-works&#34;&gt;How JobSet Works&lt;/h2&gt;
&lt;p&gt;JobSet models a distributed batch workload as a group of Kubernetes Jobs. This allows a user to
easily specify different pod templates for different distinct groups of pods (e.g. a leader,
workers, parameter servers, etc.).&lt;/p&gt;
&lt;p&gt;It uses the abstraction of a ReplicatedJob to manage child Jobs, where a ReplicatedJob is
essentially a Job Template with some desired number of Job replicas specified. This provides a
declarative way to easily create identical child-jobs to run on different islands of accelerators,
without resorting to scripting or Helm charts to generate many versions of the same job but with
different names.&lt;/p&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/23/introducing-jobset/jobset_diagram.svg&#34;
         alt=&#34;JobSet Architecture&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;Some other key JobSet features which address the problems described above include:&lt;/p&gt;
&lt;p&gt;Replicated Jobs : In modern data centers, hardware accelerators like GPUs and TPUs allocated in
islands of homogenous accelerators connected via a specialized, high bandwidth network links. For
example, a user might provision nodes containing a group of hosts co-located on a rack, each with
H100 GPUs, where GPU chips within each host are connected via NVLink, with a NVLink Switch
connecting the multiple NVLinks. TPU Pods are another example of this: TPU ViperLitePods consist of
64 hosts, each with 4 TPU v5e chips attached, all connected via ICI mesh. When running a distributed
training job across multiple of these islands, we often want to partition the workload into a group
of smaller identical jobs, 1 per island, where each pod primarily communicates with the pods within
the same island to do segments of distributed computation, and keeping the gradient synchronization
over DCN (data center network, which is lower bandwidth than ICI) to a bare minimum.&lt;/p&gt;
&lt;p&gt;Automatic headless service creation, configuration, and lifecycle management : Pod-to-pod
communication via pod hostname is enabled by default, with automatic configuration and lifecycle
management of the headless service enabling this.&lt;/p&gt;
&lt;p&gt;Configurable success policies : JobSet has configurable success policies which target specific
ReplicatedJobs, with operators to target “Any” or “All” of their child jobs. For example, you can
configure the JobSet to be marked complete if and only if all pods that are part of the “worker”
ReplicatedJob are completed.&lt;/p&gt;
&lt;p&gt;Configurable failure policies : JobSet has configurable failure policies which allow the user to
specify a maximum number of times the JobSet should be restarted in the event of a failure. If any
job is marked failed, the entire JobSet will be recreated, allowing the workload to resume from the
last checkpoint. When no failure policy is specified, if any job fails, the JobSet simply fails.&lt;/p&gt;
&lt;p&gt;Exclusive placement per topology domain : JobSet allows users to express that child jobs have 1:1
exclusive assignment to a topology domain, typically an accelerator island like a rack. For example,
if the JobSet creates two child jobs, then this feature will enforce that the pods of each child job
will be co-located on the same island, and that only one child job is allowed to schedule per
island. This is useful for scenarios where we want to use a distributed data parallel (DDP) training
strategy to train a model using multiple islands of compute resources (GPU racks or TPU slices),
running 1 model replica in each accelerator island, ensuring the forward and backward passes
themselves occur within a single model replica occurs over the high bandwidth interconnect linking
the accelerators chips within the island, and only the gradient synchronization between model
replicas occurs across accelerator islands over the lower bandwidth data center network.&lt;/p&gt;
&lt;p&gt;Integration with Kueue : Users can submit JobSets via &lt;a href=&#34;https://kueue.sigs.k8s.io/&#34;&gt;Kueue&lt;/a&gt; to
oversubscribe their clusters, queue workloads to run as capacity becomes available, prevent partial
scheduling and deadlocks, enable multi-tenancy, and more.&lt;/p&gt;
&lt;h2 id=&#34;example-use-case&#34;&gt;Example use case&lt;/h2&gt;
&lt;h3 id=&#34;distributed-ml-training-on-multiple-tpu-slices-with-jax&#34;&gt;Distributed ML training on multiple TPU slices with Jax&lt;/h3&gt;
&lt;p&gt;The following example is a JobSet spec for running a TPU Multislice workload on 4 TPU v5e
&lt;a href=&#34;https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#slices&#34;&gt;slices&lt;/a&gt;. To learn more about
TPU concepts and terminology, please refer to these
&lt;a href=&#34;https://cloud.google.com/tpu/docs/system-architecture-tpu-vm&#34;&gt;docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This example uses &lt;a href=&#34;https://jax.readthedocs.io/en/latest/quickstart.html&#34;&gt;Jax&lt;/a&gt;, an ML framework with
native support for Just-In-Time (JIT) compilation targeting TPU chips via
&lt;a href=&#34;https://github.com/openxla&#34;&gt;OpenXLA&lt;/a&gt;. However, you can also use
&lt;a href=&#34;https://pytorch.org/xla/release/2.3/index.html&#34;&gt;PyTorch/XLA&lt;/a&gt; to do ML training on TPUs.&lt;/p&gt;
&lt;p&gt;This example makes use of several JobSet features (both explicitly and implicitly) to support the
unique scheduling requirements of TPU multislice training out-of-the-box with very little
configuration required by the user.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Run a simple Jax workload on &lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;jobset.x-k8s.io/v1alpha2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;JobSet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;multislice&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Give each child Job exclusive usage of a TPU slice &lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;alpha.jobset.sigs.k8s.io/exclusive-topology&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cloud.google.com/gke-nodepool&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;failurePolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;maxRestarts&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;3&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;replicatedJobs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;workers&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;replicas&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;4&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Set to number of TPU slices&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;template&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parallelism&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;2&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Set to number of VMs per TPU slice&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;2&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Set to number of VMs per TPU slice&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backoffLimit&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;template&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostNetwork&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;dnsPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ClusterFirstWithHostNet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeSelector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cloud.google.com/gke-tpu-accelerator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;tpu-v5-lite-podslice&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cloud.google.com/gke-tpu-topology&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;2x4&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;jax-tpu&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;python:3.8&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8471&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;securityContext&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;privileged&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- bash&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- |&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;                pip install &amp;#34;jax[tpu]&amp;#34; -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;                python -c &amp;#39;import jax; print(&amp;#34;Global device count:&amp;#34;, jax.device_count())&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;                sleep 60&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;limits&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;google.com/tpu&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;4&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;future-work-and-getting-involved&#34;&gt;Future work and getting involved&lt;/h2&gt;
&lt;p&gt;We have a number of features on the JobSet roadmap planned for development this year, which can be
found in the &lt;a href=&#34;https://github.com/kubernetes-sigs/jobset?tab=readme-ov-file#roadmap&#34;&gt;JobSet roadmap&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please feel free to reach out with feedback of any kind. We’re also open to additional contributors,
whether it is to fix or report bugs, or help add new features or write documentation.&lt;/p&gt;
&lt;p&gt;You can get in touch with us via our &lt;a href=&#34;http://sigs.k8s.io/jobset&#34;&gt;repo&lt;/a&gt;, &lt;a href=&#34;https://groups.google.com/a/kubernetes.io/g/wg-batch&#34;&gt;mailing
list&lt;/a&gt; or on
&lt;a href=&#34;https://kubernetes.slack.com/messages/wg-batch&#34;&gt;Slack&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Last but not least, thanks to all &lt;a href=&#34;https://github.com/kubernetes-sigs/jobset/graphs/contributors&#34;&gt;our
contributors&lt;/a&gt; who made this project
possible!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Spotlight on SIG Apps</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/12/sig-apps-spotlight-2025/</link>
      <pubDate>Wed, 12 Mar 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/12/sig-apps-spotlight-2025/</guid>
      <description>
        
        
        &lt;p&gt;In our ongoing SIG Spotlight series, we dive into the heart of the Kubernetes project by talking to
the leaders of its various Special Interest Groups (SIGs). This time, we focus on
&lt;strong&gt;&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps#apps-special-interest-group&#34;&gt;SIG Apps&lt;/a&gt;&lt;/strong&gt;,
the group responsible for everything related to developing, deploying, and operating applications on
Kubernetes. &lt;a href=&#34;https://www.linkedin.com/in/sandipanpanda&#34;&gt;Sandipan Panda&lt;/a&gt;
(&lt;a href=&#34;https://www.devzero.io/&#34;&gt;DevZero&lt;/a&gt;) had the opportunity to interview &lt;a href=&#34;https://github.com/soltysh&#34;&gt;Maciej
Szulik&lt;/a&gt; (&lt;a href=&#34;https://defenseunicorns.com/&#34;&gt;Defense Unicorns&lt;/a&gt;) and &lt;a href=&#34;https://github.com/janetkuo&#34;&gt;Janet
Kuo&lt;/a&gt; (&lt;a href=&#34;https://about.google/&#34;&gt;Google&lt;/a&gt;), the chairs and tech leads of
SIG Apps. They shared their experiences, challenges, and visions for the future of application
management within the Kubernetes ecosystem.&lt;/p&gt;
&lt;h2 id=&#34;introductions&#34;&gt;Introductions&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: Hello, could you start by telling us a bit about yourself, your role, and your journey
within the Kubernetes community that led to your current roles in SIG Apps?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Maciej&lt;/strong&gt;: Hey, my name is Maciej, and I’m one of the leads for SIG Apps. Aside from this role, you
can also find me helping
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-cli#readme&#34;&gt;SIG CLI&lt;/a&gt; and also being one of
the Steering Committee members. I’ve been contributing to Kubernetes since late 2014 in various
areas, including controllers, apiserver, and kubectl.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Janet&lt;/strong&gt;: Certainly! I&#39;m Janet, a Staff Software Engineer at Google, and I&#39;ve been deeply involved
with the Kubernetes project since its early days, even before the 1.0 launch in 2015.  It&#39;s been an
amazing journey!&lt;/p&gt;
&lt;p&gt;My current role within the Kubernetes community is one of the chairs and tech leads of SIG Apps. My
journey with SIG Apps started organically. I started with building the Deployment API and adding
rolling update functionalities. I naturally gravitated towards SIG Apps and became increasingly
involved. Over time, I took on more responsibilities, culminating in my current leadership roles.&lt;/p&gt;
&lt;h2 id=&#34;about-sig-apps&#34;&gt;About SIG Apps&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;All following answers were jointly provided by Maciej and Janet.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: For those unfamiliar, could you provide an overview of SIG Apps&#39; mission and objectives?
What key problems does it aim to solve within the Kubernetes ecosystem?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As described in our
&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-apps/charter.md#scope&#34;&gt;charter&lt;/a&gt;, we cover a
broad area related to developing, deploying, and operating applications on Kubernetes. That, in
short, means we’re open to each and everyone showing up at our bi-weekly meetings and discussing the
ups and downs of writing and deploying various applications on Kubernetes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: What are some of the most significant projects or initiatives currently being undertaken
by SIG Apps?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;At this point in time, the main factors driving the development of our controllers are the
challenges coming from running various AI-related workloads. It’s worth giving credit here to two
working groups we’ve sponsored over the past years:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-batch&#34;&gt;The Batch Working Group&lt;/a&gt;, which is
looking at running HPC, AI/ML, and data analytics jobs on top of Kubernetes.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-serving&#34;&gt;The Serving Working Group&lt;/a&gt;, which
is focusing on hardware-accelerated AI/ML inference.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;best-practices-and-challenges&#34;&gt;Best practices and challenges&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: SIG Apps plays a crucial role in developing application management best practices for
Kubernetes. Can you share some of these best practices and how they help improve application
lifecycle management?&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Implementing &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/&#34;&gt;health checks and readiness probes&lt;/a&gt;
ensures that your applications are healthy and ready to serve traffic, leading to improved
reliability and uptime. The above, combined with comprehensive logging, monitoring, and tracing
solutions, will provide insights into your application&#39;s behavior, enabling you to identify and
resolve issues quickly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/autoscaling/&#34;&gt;Auto-scale your application&lt;/a&gt; based
on resource utilization or custom metrics, optimizing resource usage and ensuring your
application can handle varying loads.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use Deployment for stateless applications, StatefulSet for stateful applications, Job
and CronJob for batch workloads, and DaemonSet for running a daemon on each node. Use
Operators and CRDs to extend the Kubernetes API to automate the deployment, management, and
lifecycle of complex applications, making them easier to operate and reducing manual
intervention.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: What are some of the common challenges SIG Apps faces, and how do you address them?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The biggest challenge we’re facing all the time is the need to reject a lot of features, ideas, and
improvements. This requires a lot of discipline and patience to be able to explain the reasons
behind those decisions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: How has the evolution of Kubernetes influenced the work of SIG Apps? Are there any
recent changes or upcoming features in Kubernetes that you find particularly relevant or beneficial
for SIG Apps?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The main benefit for both us and the whole community around SIG Apps is the ability to extend
kubernetes with &lt;a href=&#34;https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/&#34;&gt;Custom Resource Definitions&lt;/a&gt;
and the fact that users can build their own custom controllers leveraging the built-in ones to
achieve whatever sophisticated use cases they might have and we, as the core maintainers, haven’t
considered or weren’t able to efficiently resolve inside Kubernetes.&lt;/p&gt;
&lt;h2 id=&#34;contributing-to-sig-apps&#34;&gt;Contributing to SIG Apps&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: What opportunities are available for new contributors who want to get involved with SIG
Apps, and what advice would you give them?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We get the question, &amp;quot;What good first issue might you recommend we start with?&amp;quot; a lot :-) But
unfortunately, there’s no easy answer to it. We always tell everyone that the best option to start
contributing to core controllers is to find one you are willing to spend some time with. Read
through the code, then try running unit tests and integration tests focusing on that
controller. Once you grasp the general idea, try breaking it and the tests again to verify your
breakage. Once you start feeling confident you understand that particular controller, you may want
to search through open issues affecting that controller and either provide suggestions, explaining
the problem users have, or maybe attempt your first fix.&lt;/p&gt;
&lt;p&gt;Like we said, there are no shortcuts on that road; you need to spend the time with the codebase to
understand all the edge cases we’ve slowly built up to get to the point where we are. Once you’re
successful with one controller, you’ll need to repeat that same process with others all over again.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: How does SIG Apps gather feedback from the community, and how is this feedback
integrated into your work?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We always encourage everyone to show up and present their problems and solutions during our
bi-weekly &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps#meetings&#34;&gt;meetings&lt;/a&gt;. As long
as you’re solving an interesting problem on top of Kubernetes and you can provide valuable feedback
about any of the core controllers, we’re always happy to hear from everyone.&lt;/p&gt;
&lt;h2 id=&#34;looking-ahead&#34;&gt;Looking ahead&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: Looking ahead, what are the key focus areas or upcoming trends in application management
within Kubernetes that SIG Apps is excited about? How is the SIG adapting to these trends?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Definitely the current AI hype is the major driving factor; as mentioned above, we have two working
groups, each covering a different aspect of it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sandipan: What are some of your favorite things about this SIG?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Without a doubt, the people that participate in our meetings and on
&lt;a href=&#34;https://kubernetes.slack.com/messages/sig-apps&#34;&gt;Slack&lt;/a&gt;, who tirelessly help triage issues, pull
requests and invest a lot of their time (very frequently their private time) into making kubernetes
great!&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;SIG Apps is an essential part of the Kubernetes community, helping to shape how applications are
deployed and managed at scale. From its work on improving Kubernetes&#39; workload APIs to driving
innovation in AI/ML application management, SIG Apps is continually adapting to meet the needs of
modern application developers and operators. Whether you’re a new contributor or an experienced
developer, there’s always an opportunity to get involved and make an impact.&lt;/p&gt;
&lt;p&gt;If you’re interested in learning more or contributing to SIG Apps, be sure to check out their &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps&#34;&gt;SIG
README&lt;/a&gt; and join their bi-weekly &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps#meetings&#34;&gt;meetings&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://groups.google.com/a/kubernetes.io/g/sig-apps&#34;&gt;SIG Apps Mailing List&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://kubernetes.slack.com/messages/sig-apps&#34;&gt;SIG Apps on Slack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Spotlight on SIG etcd</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/04/sig-etcd-spotlight/</link>
      <pubDate>Tue, 04 Mar 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/04/sig-etcd-spotlight/</guid>
      <description>
        
        
        &lt;p&gt;In this SIG etcd spotlight we talked with &lt;a href=&#34;https://github.com/jmhbnz&#34;&gt;James Blair&lt;/a&gt;, &lt;a href=&#34;https://github.com/serathius&#34;&gt;Marek
Siarkowicz&lt;/a&gt;, &lt;a href=&#34;https://github.com/wenjiaswe&#34;&gt;Wenjia Zhang&lt;/a&gt;, and
&lt;a href=&#34;https://github.com/ahrtr&#34;&gt;Benjamin Wang&lt;/a&gt; to learn a bit more about this Kubernetes Special Interest
Group.&lt;/p&gt;
&lt;h2 id=&#34;introducing-sig-etcd&#34;&gt;Introducing SIG etcd&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Frederico: Hello, thank you for the time! Let’s start with some introductions, could you tell us a
bit about yourself, your role and how you got involved in Kubernetes.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Benjamin:&lt;/strong&gt; Hello, I am Benjamin. I am a SIG etcd Tech Lead and one of the etcd maintainers. I
work for VMware, which is part of the Broadcom group. I got involved in Kubernetes &amp;amp; etcd &amp;amp; CSI
(&lt;a href=&#34;https://github.com/container-storage-interface/spec/blob/master/spec.md&#34;&gt;Container Storage Interface&lt;/a&gt;)
because of work and also a big passion for open source. I have been working on Kubernetes &amp;amp; etcd
(and also CSI) since 2020.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;James:&lt;/strong&gt; Hey team, I’m James, a co-chair for SIG etcd and etcd maintainer. I work at Red Hat as a
Specialist Architect helping people adopt cloud native technology. I got involved with the
Kubernetes ecosystem in 2019. Around the end of 2022 I noticed how the etcd community and project
needed help so started contributing as often as I could. There is a saying in our community that
&amp;quot;you come for the technology, and stay for the people&amp;quot;: for me this is absolutely real, it’s been a
wonderful journey so far and I’m excited to support our community moving forward.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marek:&lt;/strong&gt; Hey everyone, I&#39;m Marek, the SIG etcd lead. At Google, I lead the GKE etcd team, ensuring
a stable and reliable experience for all GKE users. My Kubernetes journey began with &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-instrumentation&#34;&gt;SIG
Instrumentation&lt;/a&gt;, where I
created and led the &lt;a href=&#34;https://kubernetes.io/blog/2020/09/04/kubernetes-1-19-introducing-structured-logs/&#34;&gt;Kubernetes Structured Logging effort&lt;/a&gt;.&lt;br&gt;
I&#39;m still the main project lead for &lt;a href=&#34;https://kubernetes-sigs.github.io/metrics-server/&#34;&gt;Kubernetes Metrics Server&lt;/a&gt;,
providing crucial signals for autoscaling in Kubernetes. I started working on etcd 3 years ago,
right around the 3.5 release. We faced some challenges, but I&#39;m thrilled to see etcd now the most
scalable and reliable it&#39;s ever been, with the highest contribution numbers in the project&#39;s
history. I&#39;m passionate about distributed systems, extreme programming, and testing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wenjia:&lt;/strong&gt; Hi there, my name is Wenjia, I am the co-chair of SIG etcd and one of the etcd
maintainers. I work at Google as an Engineering Manager, working on GKE (Google Kubernetes Engine)
and GDC (Google Distributed Cloud).  I have been working in the area of open source Kubernetes and
etcd since the Kubernetes v1.10 and etcd v3.1 releases. I got involved in Kubernetes because of my
job, but what keeps me in the space is the charm of the container orchestration technology, and more
importantly, the awesome open source community.&lt;/p&gt;
&lt;h2 id=&#34;becoming-a-kubernetes-special-interest-group-sig&#34;&gt;Becoming a Kubernetes Special Interest Group (SIG)&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Frederico: Excellent, thank you. I&#39;d like to start with the origin of the SIG itself: SIG etcd is
a very recent SIG, could you quickly go through the history and reasons behind its creation?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marek&lt;/strong&gt;: Absolutely! SIG etcd was formed because etcd is a critical component of Kubernetes,
serving as its data store. However, etcd was facing challenges like maintainer turnover and
reliability issues. &lt;a href=&#34;https://etcd.io/blog/2023/introducing-sig-etcd/&#34;&gt;Creating a dedicated SIG&lt;/a&gt;
allowed us to focus on addressing these problems, improving development and maintenance processes,
and ensuring etcd evolves in sync with the cloud-native landscape.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Frederico: And has becoming a SIG worked out as expected? Better yet, are the motivations you just
described being addressed, and to what extent?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marek&lt;/strong&gt;: It&#39;s been a positive change overall. Becoming a SIG has brought more structure and
transparency to etcd&#39;s development. We&#39;ve adopted Kubernetes processes like KEPs
(&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/README.md&#34;&gt;Kubernetes Enhancement Proposals&lt;/a&gt;
and PRRs (&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-architecture/production-readiness.md&#34;&gt;Production Readiness Reviews&lt;/a&gt;,
which has improved our feature development and release cycle.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Frederico: On top of those, what would you single out as the major benefit that has resulted from
becoming a SIG?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marek&lt;/strong&gt;: The biggest benefits for me was adopting Kubernetes testing infrastructure, tools like
&lt;a href=&#34;https://docs.prow.k8s.io/&#34;&gt;Prow&lt;/a&gt; and &lt;a href=&#34;https://testgrid.k8s.io/&#34;&gt;TestGrid&lt;/a&gt;. For large projects like
etcd there is just no comparison to the default GitHub tooling. Having known, easy to use, clear
tools is a major boost to the etcd as it makes it much easier for Kubernetes contributors to also
help etcd.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wenjia&lt;/strong&gt;: Totally agree, while challenges remain, the SIG structure provides a solid foundation
for addressing them and ensuring etcd&#39;s continued success as a critical component of the Kubernetes
ecosystem.&lt;/p&gt;
&lt;p&gt;The positive impact on the community is another crucial aspect of SIG etcd&#39;s success that I’d like
to highlight. The Kubernetes SIG structure has created a welcoming environment for etcd
contributors, leading to increased participation from the broader Kubernetes community.  We have had
greater collaboration with other SIGs like &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-api-machinery/README.md&#34;&gt;SIG API
Machinery&lt;/a&gt;,
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-scalability&#34;&gt;SIG Scalability&lt;/a&gt;,
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-scalability&#34;&gt;SIG Testing&lt;/a&gt;,
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-cluster-lifecycle&#34;&gt;SIG Cluster Lifecycle&lt;/a&gt;, etc.&lt;/p&gt;
&lt;p&gt;This collaboration helps ensure etcd&#39;s development aligns with the needs of the wider Kubernetes
ecosystem. The formation of the &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/wg-etcd-operator/README.md&#34;&gt;etcd Operator Working Group&lt;/a&gt;
under the joint effort between SIG etcd and SIG Cluster Lifecycle exemplifies this successful
collaboration, demonstrating a shared commitment to improving etcd&#39;s operational aspects within
Kubernetes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Frederico: Since you mentioned collaboration, have you seen changes in terms of contributors and
community involvement in recent months?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;James&lt;/strong&gt;: Yes -- as showing in our
&lt;a href=&#34;https://etcd.devstats.cncf.io/d/23/prs-authors-repository-groups?orgId=1&amp;var-period=m&amp;var-repogroup_name=All&amp;from=1422748800000&amp;to=1738454399000&#34;&gt;unique PR author data&lt;/a&gt;
we recently hit an all time high in March and are trending in a positive direction:&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/04/sig-etcd-spotlight/stats.png&#34;
         alt=&#34;Unique PR author data stats&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;Additionally, looking at our
&lt;a href=&#34;https://etcd.devstats.cncf.io/d/74/contributions-chart?orgId=1&amp;from=1422748800000&amp;to=1738454399000&amp;var-period=m&amp;var-metric=contributions&amp;var-repogroup_name=All&amp;var-country_name=All&amp;var-company_name=All&amp;var-company=all&#34;&gt;overall contributions across all etcd project repositories&lt;/a&gt;
we are also observing a positive trend showing a resurgence in etcd project activity:&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/03/04/sig-etcd-spotlight/stats2.png&#34;
         alt=&#34;Overall contributions stats&#34;/&gt; 
&lt;/figure&gt;
&lt;h2 id=&#34;the-road-ahead&#34;&gt;The road ahead&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Frederico: That&#39;s quite telling, thank you. In terms of the near future, what are the current
priorities for SIG etcd?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marek&lt;/strong&gt;: Reliability is always top of mind -– we need to make sure etcd is rock-solid. We&#39;re also
working on making etcd easier to use and manage for operators. And we have our sights set on making
etcd a viable standalone solution for infrastructure management, not just for Kubernetes. Oh, and of
course, scaling -– we need to ensure etcd can handle the growing demands of the cloud-native world.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Benjamin&lt;/strong&gt;: I agree that reliability should always be our top guiding principle. We need to ensure
not only correctness but also compatibility. Additionally, we should continuously strive to improve
the understandability and maintainability of etcd. Our focus should be on addressing the pain points
that the community cares about the most.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Frederico: Are there any specific SIGs that you work closely with?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marek&lt;/strong&gt;: SIG API Machinery, for sure – they own the structure of the data etcd stores, so we&#39;re
constantly working together. And SIG Cluster Lifecycle – etcd is a key part of Kubernetes clusters,
so we collaborate on the newly created etcd operator Working group.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wenjia&lt;/strong&gt;: Other than SIG API Machinery and SIG Cluster Lifecycle that Marek mentioned above, SIG
Scalability and SIG Testing is another group that we work closely with.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Frederico: In a more general sense, how would you list the key challenges for SIG etcd in the
evolving cloud native landscape?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marek&lt;/strong&gt;: Well, reliability is always a challenge when you&#39;re dealing with critical data. The
cloud-native world is evolving so fast that scaling to meet those demands is a constant effort.&lt;/p&gt;
&lt;h2 id=&#34;getting-involved&#34;&gt;Getting involved&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Frederico: We&#39;re almost at the end of our conversation, but for those interested in in etcd, how
can they get involved?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marek&lt;/strong&gt;: We&#39;d love to have them! The best way to start is to join our
&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-etcd/README.md#meetings&#34;&gt;SIG etcd meetings&lt;/a&gt;,
follow discussions on the &lt;a href=&#34;https://groups.google.com/g/etcd-dev&#34;&gt;etcd-dev mailing list&lt;/a&gt;, and check
out our &lt;a href=&#34;https://github.com/etcd-io/etcd/issues&#34;&gt;GitHub issues&lt;/a&gt;. We&#39;re always looking for people to
review proposals, test code, and contribute to documentation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wenjia&lt;/strong&gt;: I love this question 😀 . There are numerous ways for people interested in contributing
to SIG etcd to get involved and make a difference. Here are some key areas where you can help:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Code Contributions&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Bug Fixes&lt;/em&gt;: Tackle existing issues in the etcd codebase. Start with issues labeled &amp;quot;good first
issue&amp;quot; or &amp;quot;help wanted&amp;quot; to find tasks that are suitable for newcomers.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Feature Development&lt;/em&gt;: Contribute to the development of new features and enhancements. Check the
etcd roadmap and discussions to see what&#39;s being planned and where your skills might fit in.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Testing and Code Reviews&lt;/em&gt;: Help ensure the quality of etcd by writing tests, reviewing code
changes, and providing feedback.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Documentation&lt;/em&gt;: Improve &lt;a href=&#34;https://etcd.io/docs/&#34;&gt;etcd&#39;s documentation&lt;/a&gt; by adding new content,
clarifying existing information, or fixing errors. Clear and comprehensive documentation is
essential for users and contributors.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Community Support&lt;/em&gt;: Answer questions on forums, mailing lists, or &lt;a href=&#34;https://kubernetes.slack.com/archives/C3HD8ARJ5&#34;&gt;Slack  channels&lt;/a&gt;.
Helping others understand and use etcd is a valuable contribution.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Getting Started&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Join the community&lt;/em&gt;: Start by joining the etcd community on Slack,
attending SIG meetings, and following the mailing lists. This will
help you get familiar with the project, its processes, and the
people involved.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Find a mentor&lt;/em&gt;: If you&#39;re new to open source or etcd, consider
finding a mentor who can guide you and provide support. Stay tuned!
Our first cohort of mentorship program was very successful. We will
have a new round of mentorship program coming up.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Start small&lt;/em&gt;: Don&#39;t be afraid to start with small contributions. Even
fixing a typo in the documentation or submitting a simple bug fix
can be a great way to get involved.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By contributing to etcd, you&#39;ll not only be helping to improve a
critical piece of the cloud-native ecosystem but also gaining valuable
experience and skills. So, jump in and start contributing!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Frederico: Excellent, thank you. Lastly, one piece of advice that
you&#39;d like to give to other newly formed SIGs?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marek&lt;/strong&gt;: Absolutely! My advice would be to embrace the established
processes of the larger community, prioritize collaboration with other
SIGs, and focus on building a strong community.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wenjia&lt;/strong&gt;: Here are some tips I myself found very helpful in my OSS
journey:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Be patient&lt;/em&gt;: Open source development can take time. Don&#39;t get
discouraged if your contributions aren&#39;t accepted immediately or if
you encounter challenges.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Be respectful&lt;/em&gt;: The etcd community values collaboration and
respect. Be mindful of others&#39; opinions and work together to achieve
common goals.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Have fun&lt;/em&gt;: Contributing to open source should be
enjoyable. Find areas that interest you and contribute in ways that
you find fulfilling.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Frederico: A great way to end this spotlight, thank you all!&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;For more information and resources, please take a look at :&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;etcd website: &lt;a href=&#34;https://etcd.io/&#34;&gt;https://etcd.io/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;etcd GitHub repository: &lt;a href=&#34;https://github.com/etcd-io/etcd&#34;&gt;https://github.com/etcd-io/etcd&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;etcd community: &lt;a href=&#34;https://etcd.io/community/&#34;&gt;https://etcd.io/community/&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

      </description>
    </item>
    
    <item>
      <title>NFTables mode for kube-proxy</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/02/28/nftables-kube-proxy/</link>
      <pubDate>Fri, 28 Feb 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/02/28/nftables-kube-proxy/</guid>
      <description>
        
        
        &lt;p&gt;A new nftables mode for kube-proxy was introduced as an alpha feature
in Kubernetes 1.29. Currently in beta, it is expected to be GA as of
1.33. The new mode fixes long-standing performance problems with the
iptables mode and all users running on systems with reasonably-recent
kernels are encouraged to try it out. (For compatibility reasons, even
once nftables becomes GA, iptables will still be the &lt;em&gt;default&lt;/em&gt;.)&lt;/p&gt;
&lt;h2 id=&#34;why-nftables-part-1-data-plane-latency&#34;&gt;Why nftables? Part 1: data plane latency&lt;/h2&gt;
&lt;p&gt;The iptables API was designed for implementing simple firewalls, and
has problems scaling up to support Service proxying in a large
Kubernetes cluster with tens of thousands of Services.&lt;/p&gt;
&lt;p&gt;In general, the ruleset generated by kube-proxy in iptables mode has a
number of iptables rules proportional to the sum of the number of
Services and the total number of endpoints. In particular, at the top
level of the ruleset, there is one rule to test each possible Service
IP (and port) that a packet might be addressed to:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# If the packet is addressed to 172.30.0.41:80, then jump to the chain
# KUBE-SVC-XPGD46QRK7WJZT7O for further processing
-A KUBE-SERVICES -m comment --comment &amp;#34;namespace1/service1:p80 cluster IP&amp;#34; -m tcp -p tcp -d 172.30.0.41 --dport 80 -j KUBE-SVC-XPGD46QRK7WJZT7O

# If the packet is addressed to 172.30.0.42:443, then...
-A KUBE-SERVICES -m comment --comment &amp;#34;namespace2/service2:p443 cluster IP&amp;#34; -m tcp -p tcp -d 172.30.0.42 --dport 443 -j KUBE-SVC-GNZBNJ2PO5MGZ6GT

# etc...
-A KUBE-SERVICES -m comment --comment &amp;#34;namespace3/service3:p80 cluster IP&amp;#34; -m tcp -p tcp -d 172.30.0.43 --dport 80 -j KUBE-SVC-X27LE4BHSL4DOUIK
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This means that when a packet comes in, the time it takes the kernel
to check it against all of the Service rules is &lt;strong&gt;O(n)&lt;/strong&gt; in the number
of Services. As the number of Services increases, both the average and
the worst-case latency for the first packet of a new connection
increases (with the difference between best-case, average, and
worst-case being mostly determined by whether a given Service IP
address appears earlier or later in the &lt;code&gt;KUBE-SERVICES&lt;/code&gt; chain).&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/02/28/nftables-kube-proxy/iptables-only.svg&#34;
         alt=&#34;kube-proxy iptables first packet latency, at various percentiles, in clusters of various sizes&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;By contrast, with nftables, the normal way to write a ruleset like
this is to have a &lt;em&gt;single&lt;/em&gt; rule, using a &amp;quot;verdict map&amp;quot; to do the
dispatch:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;table ip kube-proxy {

        # The service-ips verdict map indicates the action to take for each matching packet.
	map service-ips {
		type ipv4_addr . inet_proto . inet_service : verdict
		comment &amp;#34;ClusterIP, ExternalIP and LoadBalancer IP traffic&amp;#34;
		elements = { 172.30.0.41 . tcp . 80 : goto service-ULMVA6XW-namespace1/service1/tcp/p80,
                             172.30.0.42 . tcp . 443 : goto service-42NFTM6N-namespace2/service2/tcp/p443,
                             172.30.0.43 . tcp . 80 : goto service-4AT6LBPK-namespace3/service3/tcp/p80,
                             ... }
        }

        # Now we just need a single rule to process all packets matching an
        # element in the map. (This rule says, &amp;#34;construct a tuple from the
        # destination IP address, layer 4 protocol, and destination port; look
        # that tuple up in &amp;#34;service-ips&amp;#34;; and if there&amp;#39;s a match, execute the
        # associated verdict.)
	chain services {
		ip daddr . meta l4proto . th dport vmap @service-ips
	}

        ...
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Since there&#39;s only a single rule, with a roughly &lt;strong&gt;O(1)&lt;/strong&gt; map lookup,
packet processing time is more or less constant regardless of cluster
size, and the best/average/worst cases are very similar:&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/02/28/nftables-kube-proxy/nftables-only.svg&#34;
         alt=&#34;kube-proxy nftables first packet latency, at various percentiles, in clusters of various sizes&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;But note the huge difference in the vertical scale between the
iptables and nftables graphs! In the clusters with 5000 and 10,000
Services, the p50 (average) latency for nftables is about the same as
the p01 (approximately best-case) latency for iptables. In the 30,000
Service cluster, the p99 (approximately worst-case) latency for
nftables manages to beat out the p01 latency for iptables by a few
microseconds! Here&#39;s both sets of data together, but you may have to
squint to see the nftables results!:&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/02/28/nftables-kube-proxy/iptables-vs-nftables.svg&#34;
         alt=&#34;kube-proxy iptables-vs-nftables first packet latency, at various percentiles, in clusters of various sizes&#34;/&gt; 
&lt;/figure&gt;
&lt;h2 id=&#34;why-nftables-part-2-control-plane-latency&#34;&gt;Why nftables? Part 2: control plane latency&lt;/h2&gt;
&lt;p&gt;While the improvements to data plane latency in large clusters are
great, there&#39;s another problem with iptables kube-proxy that often
keeps users from even being able to grow their clusters to that size:
the time it takes kube-proxy to program new iptables rules when
Services and their endpoints change.&lt;/p&gt;
&lt;p&gt;With both iptables and nftables, the total size of the ruleset as a
whole (actual rules, plus associated data) is &lt;strong&gt;O(n)&lt;/strong&gt; in the combined
number of Services and their endpoints. Originally, the iptables
backend would rewrite every rule on every update, and with tens of
thousands of Services, this could grow to be hundreds of thousands of
iptables rules. Starting in Kubernetes 1.26, we began improving
kube-proxy so that it could skip updating &lt;em&gt;most&lt;/em&gt; of the unchanged
rules in each update, but the limitations of &lt;code&gt;iptables-restore&lt;/code&gt; as an
API meant that it was still always necessary to send an update that&#39;s
&lt;strong&gt;O(n)&lt;/strong&gt; in the number of Services (though with a noticeably smaller
constant than it used to be). Even with those optimizations, it can
still be necessary to make use of kube-proxy&#39;s &lt;code&gt;minSyncPeriod&lt;/code&gt; config
option to ensure that it doesn&#39;t spend every waking second trying to
push iptables updates.&lt;/p&gt;
&lt;p&gt;The nftables APIs allow for doing much more incremental updates, and
when kube-proxy in nftables mode does an update, the size of the
update is only &lt;strong&gt;O(n)&lt;/strong&gt; in the number of Services and endpoints that
have changed since the last sync, regardless of the total number of
Services and endpoints. The fact that the nftables API allows each
nftables-using component to have its own private table also means that
there is no global lock contention between components like with
iptables. As a result, kube-proxy&#39;s nftables updates can be done much
more efficiently than with iptables.&lt;/p&gt;
&lt;p&gt;(Unfortunately I don&#39;t have cool graphs for this part.)&lt;/p&gt;
&lt;h2 id=&#34;why-not-nftables&#34;&gt;Why &lt;em&gt;not&lt;/em&gt; nftables?&lt;/h2&gt;
&lt;p&gt;All that said, there are a few reasons why you might not want to jump
right into using the nftables backend for now.&lt;/p&gt;
&lt;p&gt;First, the code is still fairly new. While it has plenty of unit
tests, performs correctly in our CI system, and has now been used in
the real world by multiple users, it has not seen anything close to as
much real-world usage as the iptables backend has, so we can&#39;t promise
that it is as stable and bug-free.&lt;/p&gt;
&lt;p&gt;Second, the nftables mode will not work on older Linux distributions;
currently it requires a 5.13 or newer kernel. Additionally, because of
bugs in early versions of the &lt;code&gt;nft&lt;/code&gt; command line tool, you should not
run kube-proxy in nftables mode on nodes that have an old (earlier
than 1.0.0) version of &lt;code&gt;nft&lt;/code&gt; in the host filesystem (or else
kube-proxy&#39;s use of nftables may interfere with other uses of nftables
on the system).&lt;/p&gt;
&lt;p&gt;Third, you may have other networking components in your cluster, such
as the pod network or NetworkPolicy implementation, that do not yet
support kube-proxy in nftables mode. You should consult the
documentation (or forums, bug tracker, etc.) for any such components
to see if they have problems with nftables mode. (In many cases they
will not; as long as they don&#39;t try to directly interact with or
override kube-proxy&#39;s iptables rules, they shouldn&#39;t care whether
kube-proxy is using iptables or nftables.) Additionally, observability
and monitoring tools that have not been updated may report less data
for kube-proxy in nftables mode than they do for kube-proxy in
iptables mode.&lt;/p&gt;
&lt;p&gt;Finally, kube-proxy in nftables mode is intentionally not 100%
compatible with kube-proxy in iptables mode. There are a few old
kube-proxy features whose default behaviors are less secure, less
performant, or less intuitive than we&#39;d like, but where we felt that
changing the default would be a compatibility break. Since the
nftables mode is opt-in, this gave us a chance to fix those bad
defaults without breaking users who weren&#39;t expecting changes. (In
particular, with nftables mode, NodePort Services are now only
reachable on their nodes&#39; default IPs, as opposed to being reachable
on all IPs, including &lt;code&gt;127.0.0.1&lt;/code&gt;, with iptables mode.) The
&lt;a href=&#34;https://kubernetes.io/docs/reference/networking/virtual-ips/#migrating-from-iptables-mode-to-nftables&#34;&gt;kube-proxy documentation&lt;/a&gt; has more information about this, including
information about metrics you can look at to determine if you are
relying on any of the changed functionality, and what configuration
options are available to get more backward-compatible behavior.&lt;/p&gt;
&lt;h2 id=&#34;trying-out-nftables-mode&#34;&gt;Trying out nftables mode&lt;/h2&gt;
&lt;p&gt;Ready to try it out? In Kubernetes 1.31 and later, you just need to
pass &lt;code&gt;--proxy-mode nftables&lt;/code&gt; to kube-proxy (or set &lt;code&gt;mode: nftables&lt;/code&gt; in
your kube-proxy config file).&lt;/p&gt;
&lt;p&gt;If you are using kubeadm to set up your cluster, the kubeadm
documentation explains &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/setup/production-environment/tools/kubeadm/control-plane-flags/#customizing-kube-proxy&#34;&gt;how to pass a &lt;code&gt;KubeProxyConfiguration&lt;/code&gt; to
&lt;code&gt;kubeadm init&lt;/code&gt;&lt;/a&gt;. You can also &lt;a href=&#34;https://kind.sigs.k8s.io/docs/user/configuration/#kube-proxy-mode&#34;&gt;deploy nftables-based clusters with
&lt;code&gt;kind&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can also convert existing clusters from iptables (or ipvs) mode to
nftables by updating the kube-proxy configuration and restarting the
kube-proxy pods. (You do not need to reboot the nodes: when restarting
in nftables mode, kube-proxy will delete any existing iptables or ipvs
rules, and likewise, if you later revert back to iptables or ipvs
mode, it will delete any existing nftables rules.)&lt;/p&gt;
&lt;h2 id=&#34;future-plans&#34;&gt;Future plans&lt;/h2&gt;
&lt;p&gt;As mentioned above, while nftables is now the &lt;em&gt;best&lt;/em&gt; kube-proxy mode,
it is not the &lt;em&gt;default&lt;/em&gt;, and we do not yet have a plan for changing
that. We will continue to support the iptables mode for a long time.&lt;/p&gt;
&lt;p&gt;The future of the IPVS mode of kube-proxy is less certain: its main
advantage over iptables was that it was faster, but certain aspects of
the IPVS architecture and APIs were awkward for kube-proxy&#39;s purposes
(for example, the fact that the &lt;code&gt;kube-ipvs0&lt;/code&gt; device needs to have
&lt;em&gt;every&lt;/em&gt; Service IP address assigned to it), and some parts of
Kubernetes Service proxying semantics were difficult to implement
using IPVS (particularly the fact that some Services had to have
different endpoints depending on whether you connected to them from a
local or remote client). And now, the nftables mode has the same
performance as IPVS mode (actually, slightly better), without any of
the downsides:&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/02/28/nftables-kube-proxy/ipvs-vs-nftables.svg&#34;
         alt=&#34;kube-proxy ipvs-vs-nftables first packet latency, at various percentiles, in clusters of various sizes&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;(In theory the IPVS mode also has the advantage of being able to use
various other IPVS functionality, like alternative &amp;quot;schedulers&amp;quot; for
balancing endpoints. In practice, this ended up not being very useful,
because kube-proxy runs independently on every node, and the IPVS
schedulers on each node had no way of sharing their state with the
proxies on other nodes, thus thwarting the effort to balance traffic
more cleverly.)&lt;/p&gt;
&lt;p&gt;While the Kubernetes project does not have an immediate plan to drop
the IPVS backend, it is probably doomed in the long run, and people
who are currently using IPVS mode should try out the nftables mode
instead (and file bugs if you think there is missing functionality in
nftables mode that you can&#39;t work around).&lt;/p&gt;
&lt;h2 id=&#34;learn-more&#34;&gt;Learn more&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&amp;quot;&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/3866-nftables-proxy/README.md&#34;&gt;KEP-3866: Add an nftables-based kube-proxy backend&lt;/a&gt;&amp;quot; has the
history of the new feature.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;quot;&lt;a href=&#34;https://youtu.be/yOGHb2HjslY?si=6O4PVJu7fGpReo1U&#34;&gt;How the Tables Have Turned: Kubernetes Says Goodbye to IPTables&lt;/a&gt;&amp;quot;,
from KubeCon/CloudNativeCon North America 2024, talks about porting
kube-proxy and Calico from iptables to nftables.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;quot;&lt;a href=&#34;https://youtu.be/uYo2O3jbJLk?si=py2AXzMJZ4PuhxNg&#34;&gt;From Observability to Performance&lt;/a&gt;&amp;quot;, from KubeCon/CloudNativeCon
North America 2024. (This is where the kube-proxy latency data came
from; the &lt;a href=&#34;https://docs.google.com/spreadsheets/d/1-ryDNc6gZocnMHEXC7mNtqknKSOv5uhXFKDx8Hu3AYA/edit&#34;&gt;raw data for the charts&lt;/a&gt; is also available.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>The Cloud Controller Manager Chicken and Egg Problem</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/02/14/cloud-controller-manager-chicken-egg-problem/</link>
      <pubDate>Fri, 14 Feb 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/02/14/cloud-controller-manager-chicken-egg-problem/</guid>
      <description>
        
        
        &lt;p&gt;Kubernetes 1.31
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/05/20/completing-cloud-provider-migration/&#34;&gt;completed the largest migration in Kubernetes history&lt;/a&gt;, removing the in-tree
cloud provider.  While the component migration is now done, this leaves some additional
complexity for users and installer projects (for example, kOps or Cluster API) .  We will go
over those additional steps and failure points and make recommendations for cluster owners.
This migration was complex and some logic had to be extracted from the core components,
building four new subsystems.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Cloud controller manager&lt;/strong&gt; (&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-cloud-provider/2392-cloud-controller-manager/README.md&#34;&gt;KEP-2392&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;API server network proxy&lt;/strong&gt; (&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1281-network-proxy&#34;&gt;KEP-1281&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;kubelet credential provider plugins&lt;/strong&gt; (&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2133-kubelet-credential-providers&#34;&gt;KEP-2133&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Storage migration to use &lt;a href=&#34;https://github.com/container-storage-interface/spec?tab=readme-ov-file#container-storage-interface-csi-specification-&#34;&gt;CSI&lt;/a&gt;&lt;/strong&gt; (&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/625-csi-migration/README.md&#34;&gt;KEP-625&lt;/a&gt;)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/architecture/cloud-controller/&#34;&gt;cloud controller manager is part of the control plane&lt;/a&gt;. It is a critical component
that replaces some functionality that existed previously in the kube-controller-manager and the
kubelet.&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/images/docs/components-of-kubernetes.svg&#34;
         alt=&#34;Components of Kubernetes&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;Components of Kubernetes&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;One of the most critical functionalities of the cloud controller manager is the node controller,
which is responsible for the initialization of the nodes.&lt;/p&gt;
&lt;p&gt;As you can see in the following diagram, when the &lt;strong&gt;kubelet&lt;/strong&gt; starts, it registers the Node
object with the apiserver, Tainting the node so it can be processed first by the
cloud-controller-manager. The initial Node is missing the cloud-provider specific information,
like the Node Addresses and the Labels with the cloud provider specific information like the
Node, Region and Instance type information.&lt;/p&gt;


&lt;figure class=&#34;diagram-medium &#34;&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/02/14/cloud-controller-manager-chicken-egg-problem/ccm-chicken-egg-problem-sequence-diagram.svg&#34;
         alt=&#34;Chicken and egg problem sequence diagram&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;Chicken and egg problem sequence diagram&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This new initialization process adds some latency to the node readiness. Previously, the kubelet
was able to initialize the node at the same time it created the node. Since the logic has moved
to the cloud-controller-manager, this can cause a &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/administer-cluster/running-cloud-controller/#chicken-and-egg&#34;&gt;chicken and egg problem&lt;/a&gt;
during the cluster bootstrapping for those Kubernetes architectures that do not deploy the
controller manager as the other components of the control plane, commonly as static pods,
standalone binaries or daemonsets/deployments with tolerations to the taints and using
&lt;code&gt;hostNetwork&lt;/code&gt; (more on this below)&lt;/p&gt;
&lt;h2 id=&#34;examples-of-the-dependency-problem&#34;&gt;Examples of the dependency problem&lt;/h2&gt;
&lt;p&gt;As noted above, it is possible during bootstrapping for the cloud-controller-manager to be
unschedulable and as such the cluster will not initialize properly. The following are a few
concrete examples of how this problem can be expressed and the root causes for why they might
occur.&lt;/p&gt;
&lt;p&gt;These examples assume you are running your cloud-controller-manager using a Kubernetes resource
(e.g. Deployment, DaemonSet, or similar) to control its lifecycle. Because these methods
rely on Kubernetes to schedule the cloud-controller-manager, care must be taken to ensure it
will schedule properly.&lt;/p&gt;
&lt;h3 id=&#34;example-cloud-controller-manager-not-scheduling-due-to-uninitialized-taint&#34;&gt;Example: Cloud controller manager not scheduling due to uninitialized taint&lt;/h3&gt;
&lt;p&gt;As &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager&#34;&gt;noted in the Kubernetes documentation&lt;/a&gt;, when the kubelet is started with the command line
flag &lt;code&gt;--cloud-provider=external&lt;/code&gt;, its corresponding &lt;code&gt;Node&lt;/code&gt; object will have a no schedule taint
named &lt;code&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;/code&gt; added. Because the cloud-controller-manager
is responsible for removing the no schedule taint, this can create a situation where a
cloud-controller-manager that is being managed by a Kubernetes resource, such as a &lt;code&gt;Deployment&lt;/code&gt;
or &lt;code&gt;DaemonSet&lt;/code&gt;, may not be able to schedule.&lt;/p&gt;
&lt;p&gt;If the cloud-controller-manager is not able to be scheduled during the initialization of the
control plane, then the resulting &lt;code&gt;Node&lt;/code&gt; objects will all have the
&lt;code&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;/code&gt; no schedule taint. It also means that this taint
will not be removed as the cloud-controller-manager is responsible for its removal. If the no
schedule taint is not removed, then critical workloads, such as the container network interface
controllers, will not be able to schedule, and the cluster will be left in an unhealthy state.&lt;/p&gt;
&lt;h3 id=&#34;example-cloud-controller-manager-not-scheduling-due-to-not-ready-taint&#34;&gt;Example: Cloud controller manager not scheduling due to not-ready taint&lt;/h3&gt;
&lt;p&gt;The next example would be possible in situations where the container network interface (CNI) is
waiting for IP address information from the cloud-controller-manager (CCM), and the CCM has not
tolerated the taint which would be removed by the CNI.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready&#34;&gt;Kubernetes documentation describes&lt;/a&gt; the &lt;code&gt;node.kubernetes.io/not-ready&lt;/code&gt; taint as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;The Node controller detects whether a Node is ready by monitoring its health and adds or removes this taint accordingly.&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;One of the conditions that can lead to a Node resource having this taint is when the container
network has not yet been initialized on that node. As the cloud-controller-manager is responsible
for adding the IP addresses to a Node resource, and the IP addresses are needed by the container
network controllers to properly configure the container network, it is possible in some
circumstances for a node to become stuck as not ready and uninitialized permanently.&lt;/p&gt;
&lt;p&gt;This situation occurs for a similar reason as the first example, although in this case, the
&lt;code&gt;node.kubernetes.io/not-ready&lt;/code&gt; taint is used with the no execute effect and thus will cause the
cloud-controller-manager not to run on the node with the taint. If the cloud-controller-manager is
not able to execute, then it will not initialize the node. It will cascade into the container
network controllers not being able to run properly, and the node will end up carrying both the
&lt;code&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;/code&gt; and &lt;code&gt;node.kubernetes.io/not-ready&lt;/code&gt; taints,
leaving the cluster in an unhealthy state.&lt;/p&gt;
&lt;h2 id=&#34;our-recommendations&#34;&gt;Our Recommendations&lt;/h2&gt;
&lt;p&gt;There is no one “correct way” to run a cloud-controller-manager. The details will depend on the
specific needs of the cluster administrators and users. When planning your clusters and the
lifecycle of the cloud-controller-managers please consider the following guidance:&lt;/p&gt;
&lt;p&gt;For cloud-controller-managers running in the same cluster, they are managing.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Use host network mode, rather than the pod network: in most cases, a cloud controller manager
will need to communicate with an API service endpoint associated with the infrastructure.
Setting “hostNetwork” to true will ensure that the cloud controller is using the host
networking instead of the container network and, as such, will have the same network access as
the host operating system. It will also remove the dependency on the networking plugin. This
will ensure that the cloud controller has access to the infrastructure endpoint (always check
your networking configuration against your infrastructure provider’s instructions).&lt;/li&gt;
&lt;li&gt;Use a scalable resource type. &lt;code&gt;Deployments&lt;/code&gt; and &lt;code&gt;DaemonSets&lt;/code&gt; are useful for controlling the
lifecycle of a cloud controller. They allow easy access to running multiple copies for redundancy
as well as using the Kubernetes scheduling to ensure proper placement in the cluster. When using
these primitives to control the lifecycle of your cloud controllers and running multiple
replicas, you must remember to enable leader election, or else your controllers will collide
with each other which could lead to nodes not being initialized in the cluster.&lt;/li&gt;
&lt;li&gt;Target the controller manager containers to the control plane. There might exist other
controllers which need to run outside the control plane (for example, Azure’s node manager
controller). Still, the controller managers themselves should be deployed to the control plane.
Use a node selector or affinity stanza to direct the scheduling of cloud controllers to the
control plane to ensure that they are running in a protected space. Cloud controllers are vital
to adding and removing nodes to a cluster as they form a link between Kubernetes and the
physical infrastructure. Running them on the control plane will help to ensure that they run
with a similar priority as other core cluster controllers and that they have some separation
from non-privileged user workloads.
&lt;ol&gt;
&lt;li&gt;It is worth noting that an anti-affinity stanza to prevent cloud controllers from running
on the same host is also very useful to ensure that a single node failure will not degrade
the cloud controller performance.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Ensure that the tolerations allow operation. Use tolerations on the manifest for the cloud
controller container to ensure that it will schedule to the correct nodes and that it can run
in situations where a node is initializing. This means that cloud controllers should tolerate
the &lt;code&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;/code&gt; taint, and it should also tolerate any
taints associated with the control plane (for example, &lt;code&gt;node-role.kubernetes.io/control-plane&lt;/code&gt;
or &lt;code&gt;node-role.kubernetes.io/master&lt;/code&gt;). It can also be useful to tolerate the
&lt;code&gt;node.kubernetes.io/not-ready&lt;/code&gt; taint to ensure that the cloud controller can run even when the
node is not yet available for health monitoring.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For cloud-controller-managers that will not be running on the cluster they manage (for example,
in a hosted control plane on a separate cluster), then the rules are much more constrained by the
dependencies of the environment of the cluster running the cloud-controller-manager. The advice
for running on a self-managed cluster may not be appropriate as the types of conflicts and network
constraints will be different. Please consult the architecture and requirements of your topology
for these scenarios.&lt;/p&gt;
&lt;h3 id=&#34;example&#34;&gt;Example&lt;/h3&gt;
&lt;p&gt;This is an example of a Kubernetes Deployment highlighting the guidance shown above. It is
important to note that this is for demonstration purposes only, for production uses please
consult your cloud provider’s documentation.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: cloud-controller-manager
  name: cloud-controller-manager
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: cloud-controller-manager
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app.kubernetes.io/name: cloud-controller-manager
      annotations:
        kubernetes.io/description: Cloud controller manager for my infrastructure
    spec:
      containers: # the container details will depend on your specific cloud controller manager
      - name: cloud-controller-manager
        command:
        - /bin/my-infrastructure-cloud-controller-manager
        - --leader-elect=true
        - -v=1
        image: registry/my-infrastructure-cloud-controller-manager@latest
        resources:
          requests:
            cpu: 200m
            memory: 50Mi
      hostNetwork: true # these Pods are part of the control plane
      nodeSelector:
        node-role.kubernetes.io/control-plane: &amp;#34;&amp;#34;
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: &amp;#34;kubernetes.io/hostname&amp;#34;
            labelSelector:
              matchLabels:
                app.kubernetes.io/name: cloud-controller-manager
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 120
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 120
      - effect: NoSchedule
        key: node.cloudprovider.kubernetes.io/uninitialized
        operator: Exists
      - effect: NoSchedule
        key: node.kubernetes.io/not-ready
        operator: Exists
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;When deciding how to deploy your cloud controller manager it is worth noting that
cluster-proportional, or resource-based, pod autoscaling is not recommended. Running multiple
replicas of a cloud controller manager is good practice for ensuring high-availability and
redundancy, but does not contribute to better performance. In general, only a single instance
of a cloud controller manager will be reconciling a cluster at any given time.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Spotlight on SIG Architecture: Enhancements</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/01/21/sig-architecture-enhancements/</link>
      <pubDate>Tue, 21 Jan 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2025/01/21/sig-architecture-enhancements/</guid>
      <description>
        
        
        &lt;p&gt;&lt;em&gt;This is the fourth interview of a SIG Architecture Spotlight series that will cover the different
subprojects, and we will be covering &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-architecture/README.md#enhancements&#34;&gt;SIG Architecture:
Enhancements&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this SIG Architecture spotlight we talked with &lt;a href=&#34;https://github.com/kikisdeliveryservice&#34;&gt;Kirsten
Garrison&lt;/a&gt;, lead of the Enhancements subproject.&lt;/p&gt;
&lt;h2 id=&#34;the-enhancements-subproject&#34;&gt;The Enhancements subproject&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Frederico (FSM): Hi Kirsten, very happy to have the opportunity to talk about the Enhancements
subproject. Let&#39;s start with some quick information about yourself and your role.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kirsten Garrison (KG)&lt;/strong&gt;: I’m a lead of the Enhancements subproject of SIG-Architecture and
currently work at Google. I first got involved by contributing to the service-catalog project with
the help of &lt;a href=&#34;https://github.com/carolynvs&#34;&gt;Carolyn Van Slyck&lt;/a&gt;. With time, &lt;a href=&#34;https://github.com/kubernetes/sig-release/blob/master/releases/release-1.17/release_team.md&#34;&gt;I joined the Release
team&lt;/a&gt;,
eventually becoming the Enhancements Lead and a Release Lead shadow. While on the release team, I
worked on some ideas to make the process better for the SIGs and Enhancements team (the opt-in
process) based on my team’s experiences. Eventually, I started attending Subproject meetings and
contributing to the Subproject’s work.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FSM: You mentioned the Enhancements subproject: how would you describe its main goals and areas of
intervention?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;: The &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-architecture/README.md#enhancements&#34;&gt;Enhancements
Subproject&lt;/a&gt;
primarily concerns itself with the &lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/0000-kep-process/README.md&#34;&gt;Kubernetes Enhancement
Proposal&lt;/a&gt;
(&lt;em&gt;KEP&lt;/em&gt; for short)—the &amp;quot;design&amp;quot; documents required for all features and significant changes
to the Kubernetes project.&lt;/p&gt;
&lt;h2 id=&#34;the-kep-and-its-impact&#34;&gt;The KEP and its impact&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;FSM: The improvement of the KEP process was (and is) one in which SIG Architecture was heavily
involved. Could you explain the process to those that aren’t aware of it?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;: &lt;a href=&#34;https://kubernetes.io/releases/release/#the-release-cycle&#34;&gt;Every release&lt;/a&gt;, the SIGs let the
Release Team know which features they intend to work on to be put into the release. As mentioned
above, the prerequisite for these changes is a KEP - a standardized design document that all authors
must fill out and approve in the first weeks of the release cycle. Most features &lt;a href=&#34;https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#feature-stages&#34;&gt;will move
through 3
phases&lt;/a&gt;:
alpha, beta and finally GA so approving a feature represents a significant commitment for the SIG.&lt;/p&gt;
&lt;p&gt;The KEP serves as the full source of truth of a feature. The &lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md&#34;&gt;KEP
template&lt;/a&gt;
has different requirements based on what stage a feature is in, but it generally requires a detailed
discussion of the design and the impact as well as providing artifacts of stability and
performance. The KEP takes quite a bit of iterative work between authors, SIG reviewers, api review
team and the Production Readiness Review team&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt; before it is approved. Each set of reviewers is
looking to make sure that the proposal meets their standards in order to have a stable and
performant Kubernetes release. Only after all approvals are secured, can an author go forth and
merge their feature in the Kubernetes code base.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FSM: I see, quite a bit of additional structure was added. Looking back, what were the most
significant improvements of that approach?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;: In general, I think that the improvements with the most impact had to do with focusing on
the core intent of the KEP. KEPs exist not just to memorialize designs, but provide a structured way
to discuss and come to an agreement about different facets of the change. At the core of the KEP
process is communication and consideration.&lt;/p&gt;
&lt;p&gt;To that end, some of the significant changes revolve around a more detailed and accessible KEP
template. A significant amount of work was put in over time to get the
&lt;a href=&#34;https://github.com/kubernetes/enhancements&#34;&gt;k/enhancements&lt;/a&gt; repo into its current form -- a
directory structure organized by SIG with the contours of the modern KEP template (with
Proposal/Motivation/Design Details subsections). We might take that basic structure for granted
today, but it really represents the work of many people trying to get the foundation of this process
in place over time.&lt;/p&gt;
&lt;p&gt;As Kubernetes matures, we’ve needed to think about more than just the end goal of getting a single
feature merged. We need to think about things like: stability, performance, setting and meeting user
expectations. And as we’ve thought about those things the template has grown more detailed. The
addition of the Production Readiness Review was major as well as the enhanced testing requirements
(varying at different stages of a KEP’s lifecycle).&lt;/p&gt;
&lt;h2 id=&#34;current-areas-of-focus&#34;&gt;Current areas of focus&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;FSM: Speaking of maturing, we’ve &lt;a href=&#34;https://kubernetes.io/blog/2024/08/13/kubernetes-v1-31-release/&#34;&gt;recently released Kubernetes
v1.31&lt;/a&gt;, and work on v1.32 &lt;a href=&#34;https://github.com/fsmunoz/sig-release/tree/release-1.32/releases/release-1.32&#34;&gt;has
started&lt;/a&gt;. Are there
any areas that the Enhancements sub-project is currently addressing that might change the way things
are done?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;: We’re currently working on two things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Creating a Process KEP template.&lt;/em&gt; Sometimes people want to harness the KEP process for
significant changes that are more process oriented rather than feature oriented. We want to
support this because memorializing changes is important and giving people a better tool to do so
will only encourage more discussion and transparency.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;KEP versioning.&lt;/em&gt; While our template changes aim to be as non-disruptive as possible, we
believe that it will be easier to track and communicate those changes to the community better with
a versioned KEP template and the policies that go alongside such versioning.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both features will take some time to get right and fully roll out (just like a KEP feature) but we
believe that they will both provide improvements that will benefit the community at large.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FSM: You mentioned improvements: I remember when project boards for Enhancement tracking were
introduced in recent releases, to great effect and unanimous applause from release team members. Was
this a particular area of focus for the subproject?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;: The Subproject provided support to the Release Team’s Enhancement team in the migration away
from using the spreadsheet to a project board. The collection and tracking of enhancements has
always been a logistical challenge. During my time on the Release Team, I helped with the transition
to an opt-in system of enhancements, whereby the SIG leads &amp;quot;opt-in&amp;quot; KEPs for release tracking. This
helped to enhance communication between authors and SIGs before any significant work was undertaken
on a KEP and removed toil from the Enhancements team. This change used the existing tools to avoid
introducing too many changes at once to the community. Later, the Release Team approached the
Subproject with an idea of leveraging GitHub Project Boards to further improve the collection
process. This was to be a move away from the use of complicated spreadsheets to using repo-native
labels on &lt;a href=&#34;https://github.com/kubernetes/enhancements&#34;&gt;k/enhancement&lt;/a&gt; issues and project boards.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FSM: That surely adds an impact on simplifying the workflow...&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;: Removing sources of friction and promoting clear communication is very important to the
Enhancements Subproject.  At the same time, it’s important to give careful consideration to
decisions that impact the community as a whole. We want to make sure that changes are balanced to
give an upside and while not causing any regressions and pain in the rollout. We supported the
Release Team in ideation as well as through the actual migration to the project boards. It was a
great success and exciting to see the team make high impact changes that helped everyone involved in
the KEP process!&lt;/p&gt;
&lt;h2 id=&#34;getting-involved&#34;&gt;Getting involved&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;FSM: For those reading that might be curious and interested in helping, how would you describe the
required skills for participating in the sub-project?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;: Familiarity with KEPs either via experience or taking time to look through the
kubernetes/enhancements repo is helpful. All are welcome to participate if interested - we can take
it from there.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FSM: Excellent! Many thanks for your time and insight -- any final comments you would like to
share with our readers?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;: The Enhancements process is one of the most important parts of Kubernetes and requires
enormous amounts of coordination and collaboration of people and teams across the project to make it
successful. I’m thankful and inspired by everyone’s continued hard work and dedication to making the
project great. This is truly a wonderful community.&lt;/p&gt;
&lt;div class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34;&gt;
&lt;p&gt;For more information, check the &lt;a href=&#34;https://kubernetes.io/blog/2023/11/02/sig-architecture-production-readiness-spotlight-2023/&#34;&gt;Production Readiness Review spotlight
interview&lt;/a&gt;
in this series.&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes 1.32: Moving Volume Group Snapshots to Beta</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/18/kubernetes-1-32-volume-group-snapshot-beta/</link>
      <pubDate>Wed, 18 Dec 2024 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/18/kubernetes-1-32-volume-group-snapshot-beta/</guid>
      <description>
        
        
        &lt;p&gt;Volume group snapshots were &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2023/05/08/kubernetes-1-27-volume-group-snapshot-alpha/&#34;&gt;introduced&lt;/a&gt;
as an Alpha feature with the Kubernetes 1.27 release.
The recent release of Kubernetes v1.32 moved that support to &lt;strong&gt;beta&lt;/strong&gt;.
The support for volume group snapshots relies on a set of
&lt;a href=&#34;https://kubernetes-csi.github.io/docs/group-snapshot-restore-feature.html#volume-group-snapshot-apis&#34;&gt;extension APIs for group snapshots&lt;/a&gt;.
These APIs allow users to take crash consistent snapshots for a set of volumes.
Behind the scenes, Kubernetes uses a label selector to group multiple PersistentVolumeClaims
for snapshotting.
A key aim is to allow you restore that set of snapshots to new volumes and
recover your workload based on a crash consistent recovery point.&lt;/p&gt;
&lt;p&gt;This new feature is only supported for &lt;a href=&#34;https://kubernetes-csi.github.io/docs/&#34;&gt;CSI&lt;/a&gt; volume drivers.&lt;/p&gt;
&lt;h2 id=&#34;an-overview-of-volume-group-snapshots&#34;&gt;An overview of volume group snapshots&lt;/h2&gt;
&lt;p&gt;Some storage systems provide the ability to create a crash consistent snapshot of
multiple volumes. A group snapshot represents &lt;em&gt;copies&lt;/em&gt; made from multiple volumes, that
are taken at the same point-in-time. A group snapshot can be used either to rehydrate
new volumes (pre-populated with the snapshot data) or to restore existing volumes to
a previous state (represented by the snapshots).&lt;/p&gt;
&lt;h2 id=&#34;why-add-volume-group-snapshots-to-kubernetes&#34;&gt;Why add volume group snapshots to Kubernetes?&lt;/h2&gt;
&lt;p&gt;The Kubernetes volume plugin system already provides a powerful abstraction that
automates the provisioning, attaching, mounting, resizing, and snapshotting of block
and file storage.&lt;/p&gt;
&lt;p&gt;Underpinning all these features is the Kubernetes goal of workload portability:
Kubernetes aims to create an abstraction layer between distributed applications and
underlying clusters so that applications can be agnostic to the specifics of the
cluster they run on and application deployment requires no cluster specific knowledge.&lt;/p&gt;
&lt;p&gt;There was already a &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/storage/volume-snapshots/&#34;&gt;VolumeSnapshot&lt;/a&gt; API
that provides the ability to take a snapshot of a persistent volume to protect against
data loss or data corruption. However, there are other snapshotting functionalities
not covered by the VolumeSnapshot API.&lt;/p&gt;
&lt;p&gt;Some storage systems support consistent group snapshots that allow a snapshot to be
taken from multiple volumes at the same point-in-time to achieve write order consistency.
This can be useful for applications that contain multiple volumes. For example,
an application may have data stored in one volume and logs stored in another volume.
If snapshots for the data volume and the logs volume are taken at different times,
the application will not be consistent and will not function properly if it is restored
from those snapshots when a disaster strikes.&lt;/p&gt;
&lt;p&gt;It is true that you can quiesce the application first, take an individual snapshot from
each volume that is part of the application one after the other, and then unquiesce the
application after all the individual snapshots are taken. This way, you would get
application consistent snapshots.&lt;/p&gt;
&lt;p&gt;However, sometimes the application quiesce can be so time consuming that you want to do it less frequently,
or it may not be possible to quiesce an application at all.
For example, a user may want to run weekly backups with application quiesce
and nightly backups without application quiesce but with consistent group support which
provides crash consistency across all volumes in the group.&lt;/p&gt;
&lt;h2 id=&#34;kubernetes-apis-for-volume-group-snapshots&#34;&gt;Kubernetes APIs for volume group snapshots&lt;/h2&gt;
&lt;p&gt;Kubernetes&#39; support for &lt;em&gt;volume group snapshots&lt;/em&gt; relies on three API kinds that
are used
for managing snapshots:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;VolumeGroupSnapshot&lt;/dt&gt;
&lt;dd&gt;Created by a Kubernetes user (or perhaps by your own automation) to request
creation of a volume group snapshot for multiple persistent volume claims.
It contains information about the volume group snapshot operation such as the
timestamp when the volume group snapshot was taken and whether it is ready to use.
The creation and deletion of this object represents a desire to create or delete a
cluster resource (a group snapshot).&lt;/dd&gt;
&lt;dt&gt;VolumeGroupSnapshotContent&lt;/dt&gt;
&lt;dd&gt;Created by the snapshot controller for a dynamically created VolumeGroupSnapshot.
It contains information about the volume group snapshot including the volume group
snapshot ID.
This object represents a provisioned resource on the cluster (a group snapshot).
The VolumeGroupSnapshotContent object binds to the VolumeGroupSnapshot for which it
was created with a one-to-one mapping.&lt;/dd&gt;
&lt;dt&gt;VolumeGroupSnapshotClass&lt;/dt&gt;
&lt;dd&gt;Created by cluster administrators to describe how volume group snapshots should be
created, including the driver information, the deletion policy, etc.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;These three API kinds are defined as
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/extend-kubernetes/api-extension/custom-resources/&#34;&gt;CustomResourceDefinitions&lt;/a&gt;
(CRDs).
These CRDs must be installed in a Kubernetes cluster for a CSI Driver to support
volume group snapshots.&lt;/p&gt;
&lt;h2 id=&#34;what-components-are-needed-to-support-volume-group-snapshots&#34;&gt;What components are needed to support volume group snapshots&lt;/h2&gt;
&lt;p&gt;Volume group snapshots are implemented in the
&lt;a href=&#34;https://github.com/kubernetes-csi/external-snapshotter&#34;&gt;external-snapshotter&lt;/a&gt; repository.
Implementing volume group snapshots meant adding or changing several components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Added new CustomResourceDefinitions for VolumeGroupSnapshot and two supporting APIs.&lt;/li&gt;
&lt;li&gt;Volume group snapshot controller logic is added to the common snapshot controller.&lt;/li&gt;
&lt;li&gt;Adding logic to make CSI calls into the snapshotter sidecar controller.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The volume snapshot controller and CRDs are deployed once per
cluster, while the sidecar is bundled with each CSI driver.&lt;/p&gt;
&lt;p&gt;Therefore, it makes sense to deploy the volume snapshot controller and CRDs as a cluster addon.&lt;/p&gt;
&lt;p&gt;The Kubernetes project recommends that Kubernetes distributors
bundle and deploy the volume snapshot controller and CRDs as part
of their Kubernetes cluster management process (independent of any CSI Driver).&lt;/p&gt;
&lt;h2 id=&#34;what-s-new-in-beta&#34;&gt;What&#39;s new in Beta?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The VolumeGroupSnapshot feature in CSI spec moved to GA in the &lt;a href=&#34;https://github.com/container-storage-interface/spec/releases/tag/v1.11.0&#34;&gt;v1.11.0 release&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The snapshot validation webhook was deprecated in external-snapshotter v8.0.0 and it is now removed.
Most of the validation webhook logic was added as validation rules into the CRDs.
Minimum required Kubernetes version is 1.25 for these validation rules.
One thing in the validation webhook not moved to CRDs is the prevention of creating
multiple default volume snapshot classes and multiple default volume group snapshot classes
for the same CSI driver.
With the removal of the validation webhook, an error will still be raised when dynamically
provisioning a VolumeSnapshot or VolumeGroupSnapshot when multiple default volume snapshot
classes or multiple default volume group snapshot classes for the same CSI driver exist.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;code&gt;enable-volumegroup-snapshot&lt;/code&gt; flag in the snapshot-controller and the CSI snapshotter
sidecar has been replaced by a feature gate.
Since VolumeGroupSnapshot is a new API, the feature moves to Beta but the feature gate is
disabled by default.
To use this feature, enable the feature gate by adding the flag &lt;code&gt;--feature-gates=CSIVolumeGroupSnapshot=true&lt;/code&gt;
when starting the snapshot-controller and the CSI snapshotter sidecar.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The logic to dynamically create the VolumeGroupSnapshot and its corresponding individual
VolumeSnapshot and VolumeSnapshotContent objects are moved from the CSI snapshotter to the common
snapshot-controller.
New RBAC rules are added to the common snapshot-controller and some RBAC rules are removed from
the CSI snapshotter sidecar accordingly.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-do-i-use-kubernetes-volume-group-snapshots&#34;&gt;How do I use Kubernetes volume group snapshots&lt;/h2&gt;
&lt;h3 id=&#34;creating-a-new-group-snapshot-with-kubernetes&#34;&gt;Creating a new group snapshot with Kubernetes&lt;/h3&gt;
&lt;p&gt;Once a VolumeGroupSnapshotClass object is defined and you have volumes you want to
snapshot together, you may request a new group snapshot by creating a VolumeGroupSnapshot
object.&lt;/p&gt;
&lt;p&gt;The source of the group snapshot specifies whether the underlying group snapshot
should be dynamically created or if a pre-existing VolumeGroupSnapshotContent
should be used.&lt;/p&gt;
&lt;p&gt;A pre-existing VolumeGroupSnapshotContent is created by a cluster administrator.
It contains the details of the real volume group snapshot on the storage system which
is available for use by cluster users.&lt;/p&gt;
&lt;p&gt;One of the following members in the source of the group snapshot must be set.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;selector&lt;/code&gt; - a label query over PersistentVolumeClaims that are to be grouped
together for snapshotting. This selector will be used to match the label
added to a PVC.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;volumeGroupSnapshotContentName&lt;/code&gt; - specifies the name of a pre-existing
VolumeGroupSnapshotContent object representing an existing volume group snapshot.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;dynamically-provision-a-group-snapshot&#34;&gt;Dynamically provision a group snapshot&lt;/h4&gt;
&lt;p&gt;In the following example, there are two PVCs.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;NAME    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;pvc-0   Bound    pvc-6e1f7d34-a5c5-4548-b104-01e72c72b9f2   100Mi      RWO            csi-hostpath-sc   &amp;lt;unset&amp;gt;                 2m15s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;pvc-1   Bound    pvc-abc640b3-2cc1-4c56-ad0c-4f0f0e636efa   100Mi      RWO            csi-hostpath-sc   &amp;lt;unset&amp;gt;                 2m7s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Label the PVCs.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;%&lt;/span&gt; kubectl label pvc pvc-0 &lt;span style=&#34;color:#b8860b&#34;&gt;group&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;myGroup
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;persistentvolumeclaim/pvc-0 labeled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt;&lt;span style=&#34;&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;%&lt;/span&gt; kubectl label pvc pvc-1 &lt;span style=&#34;color:#b8860b&#34;&gt;group&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;myGroup
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;persistentvolumeclaim/pvc-1 labeled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For dynamic provisioning, a selector must be set so that the snapshot controller can find PVCs
with the matching labels to be snapshotted together.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;groupsnapshot.storage.k8s.io/v1beta1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;VolumeGroupSnapshot&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;snapshot-daily-20241217&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;demo-namespace&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeGroupSnapshotClassName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;csi-groupSnapclass&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;source&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;selector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;group&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myGroup&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the VolumeGroupSnapshot spec, a user can specify the VolumeGroupSnapshotClass which
has the information about which CSI driver should be used for creating the group snapshot.
A VolumGroupSnapshotClass is required for dynamic provisioning.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;groupsnapshot.storage.k8s.io/v1beta1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;VolumeGroupSnapshotClass&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;csi-groupSnapclass&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kubernetes.io/description&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Example group snapshot class&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;driver&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example.csi.k8s.io&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;deletionPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Delete&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As a result of the volume group snapshot creation, a corresponding VolumeGroupSnapshotContent
object will be created with a volumeGroupSnapshotHandle pointing to a resource on the storage
system.&lt;/p&gt;
&lt;p&gt;Two individual volume snapshots will be created as part of the volume group snapshot creation.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;NAME                                                                        READYTOUSE   SOURCEPVC   RESTORESIZE   SNAPSHOTCONTENT                                                                AGE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;snapshot-0962a745b2bf930bb385b7b50c9b08af471f1a16780726de19429dd9c94eaca0   true         pvc-0       100Mi         snapcontent-0962a745b2bf930bb385b7b50c9b08af471f1a16780726de19429dd9c94eaca0   16m
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;snapshot-da577d76bd2106c410616b346b2e72440f6ec7b12a75156263b989192b78caff   true         pvc-1       100Mi         snapcontent-da577d76bd2106c410616b346b2e72440f6ec7b12a75156263b989192b78caff   16m
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id=&#34;importing-an-existing-group-snapshot-with-kubernetes&#34;&gt;Importing an existing group snapshot with Kubernetes&lt;/h4&gt;
&lt;p&gt;To import a pre-existing volume group snapshot into Kubernetes, you must also import
the corresponding individual volume snapshots.&lt;/p&gt;
&lt;p&gt;Identify the individual volume snapshot handles, manually construct a
VolumeSnapshotContent object first, then create a VolumeSnapshot object pointing to
the VolumeSnapshotContent object. Repeat this for every individual volume snapshot.&lt;/p&gt;
&lt;p&gt;Then manually create a VolumeGroupSnapshotContent object, specifying the
volumeGroupSnapshotHandle and individual volumeSnapshotHandles already existing
on the storage system.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;groupsnapshot.storage.k8s.io/v1beta1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;VolumeGroupSnapshotContent&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;static-group-content&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;deletionPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Delete&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;driver&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;hostpath.csi.k8s.io&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;source&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;groupSnapshotHandles&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeGroupSnapshotHandle&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;e8779136-a93e-11ef-9549-66940726f2fd&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeSnapshotHandles&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- e8779147-a93e-11ef-9549-66940726f2fd&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- e8783cd0-a93e-11ef-9549-66940726f2fd&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeGroupSnapshotRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;static-group-snapshot&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;demo-namespace&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After that create a VolumeGroupSnapshot object pointing to the VolumeGroupSnapshotContent
object.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;groupsnapshot.storage.k8s.io/v1beta1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;VolumeGroupSnapshot&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;static-group-snapshot&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;demo-namespace&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;source&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeGroupSnapshotContentName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;static-group-content&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;how-to-use-group-snapshot-for-restore-in-kubernetes&#34;&gt;How to use group snapshot for restore in Kubernetes&lt;/h3&gt;
&lt;p&gt;At restore time, the user can request a new PersistentVolumeClaim to be created from
a VolumeSnapshot object that is part of a VolumeGroupSnapshot. This will trigger
provisioning of a new volume that is pre-populated with data from the specified
snapshot. The user should repeat this until all volumes are created from all the
snapshots that are part of a group snapshot.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PersistentVolumeClaim&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;examplepvc-restored-2024-12-17&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;demo-namespace&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storageClassName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example-foo-nearline&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;dataSource&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;snapshot-0962a745b2bf930bb385b7b50c9b08af471f1a16780726de19429dd9c94eaca0&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;VolumeSnapshot&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiGroup&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;snapshot.storage.k8s.io&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;accessModes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- ReadWriteOncePod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requests&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storage&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;100Mi&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# must be enough storage to fit the existing snapshot&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;as-a-storage-vendor-how-do-i-add-support-for-group-snapshots-to-my-csi-driver&#34;&gt;As a storage vendor, how do I add support for group snapshots to my CSI driver?&lt;/h2&gt;
&lt;p&gt;To implement the volume group snapshot feature, a CSI driver &lt;strong&gt;must&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implement a new group controller service.&lt;/li&gt;
&lt;li&gt;Implement group controller RPCs: &lt;code&gt;CreateVolumeGroupSnapshot&lt;/code&gt;, &lt;code&gt;DeleteVolumeGroupSnapshot&lt;/code&gt;, and &lt;code&gt;GetVolumeGroupSnapshot&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Add group controller capability &lt;code&gt;CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See the &lt;a href=&#34;https://github.com/container-storage-interface/spec/blob/master/spec.md&#34;&gt;CSI spec&lt;/a&gt;
and the &lt;a href=&#34;https://kubernetes-csi.github.io/docs/&#34;&gt;Kubernetes-CSI Driver Developer Guide&lt;/a&gt;
for more details.&lt;/p&gt;
&lt;p&gt;As mentioned earlier, it is strongly recommended that Kubernetes distributors
bundle and deploy the volume snapshot controller and CRDs as part
of their Kubernetes cluster management process (independent of any CSI Driver).&lt;/p&gt;
&lt;p&gt;As part of this recommended deployment process, the Kubernetes team provides a number of
sidecar (helper) containers, including the
&lt;a href=&#34;https://kubernetes-csi.github.io/docs/external-snapshotter.html&#34;&gt;external-snapshotter sidecar container&lt;/a&gt;
which has been updated to support volume group snapshot.&lt;/p&gt;
&lt;p&gt;The external-snapshotter watches the Kubernetes API server for
VolumeGroupSnapshotContent objects, and triggers &lt;code&gt;CreateVolumeGroupSnapshot&lt;/code&gt; and
&lt;code&gt;DeleteVolumeGroupSnapshot&lt;/code&gt; operations against a CSI endpoint.&lt;/p&gt;
&lt;h2 id=&#34;what-are-the-limitations&#34;&gt;What are the limitations?&lt;/h2&gt;
&lt;p&gt;The beta implementation of volume group snapshots for Kubernetes has the following limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Does not support reverting an existing PVC to an earlier state represented by
a snapshot (only supports provisioning a new volume from a snapshot).&lt;/li&gt;
&lt;li&gt;No application consistency guarantees beyond any guarantees provided by the storage system
(e.g. crash consistency). See this &lt;a href=&#34;https://github.com/kubernetes/community/blob/30d06f49fba22273f31b3c616b74cf8745c19b3d/wg-data-protection/data-protection-workflows-white-paper.md#quiesce-and-unquiesce-hooks&#34;&gt;doc&lt;/a&gt;
for more discussions on application consistency.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;what-s-next&#34;&gt;What’s next?&lt;/h2&gt;
&lt;p&gt;Depending on feedback and adoption, the Kubernetes project plans to push the volume
group snapshot implementation to general availability (GA) in a future release.&lt;/p&gt;
&lt;h2 id=&#34;how-can-i-learn-more&#34;&gt;How can I learn more?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3476-volume-group-snapshot&#34;&gt;design spec&lt;/a&gt;
for the volume group snapshot feature.&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://github.com/kubernetes-csi/external-snapshotter&#34;&gt;code repository&lt;/a&gt; for volume group
snapshot APIs and controller.&lt;/li&gt;
&lt;li&gt;CSI &lt;a href=&#34;https://kubernetes-csi.github.io/docs/&#34;&gt;documentation&lt;/a&gt; on the group snapshot feature.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-do-i-get-involved&#34;&gt;How do I get involved?&lt;/h2&gt;
&lt;p&gt;This project, like all of Kubernetes, is the result of hard work by many contributors
from diverse backgrounds working together. On behalf of SIG Storage, I would like to
offer a huge thank you to the contributors who stepped up these last few quarters
to help the project reach beta:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ben Swartzlander (&lt;a href=&#34;https://github.com/bswartz&#34;&gt;bswartz&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Cici Huang (&lt;a href=&#34;https://github.com/cici37&#34;&gt;cici37&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Hemant Kumar (&lt;a href=&#34;https://github.com/gnufied&#34;&gt;gnufied&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;James Defelice (&lt;a href=&#34;https://github.com/jdef&#34;&gt;jdef&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Jan Šafránek (&lt;a href=&#34;https://github.com/jsafrane&#34;&gt;jsafrane&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Madhu Rajanna (&lt;a href=&#34;https://github.com/Madhu-1&#34;&gt;Madhu-1&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Manish M Yathnalli (&lt;a href=&#34;https://github.com/manishym&#34;&gt;manishym&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Michelle Au (&lt;a href=&#34;https://github.com/msau42&#34;&gt;msau42&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Niels de Vos (&lt;a href=&#34;https://github.com/nixpanic&#34;&gt;nixpanic&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Leonardo Cecchi (&lt;a href=&#34;https://github.com/leonardoce&#34;&gt;leonardoce&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Rakshith R (&lt;a href=&#34;https://github.com/Rakshith-R&#34;&gt;Rakshith-R&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Raunak Shah (&lt;a href=&#34;https://github.com/RaunakShah&#34;&gt;RaunakShah&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Saad Ali (&lt;a href=&#34;https://github.com/saad-ali&#34;&gt;saad-ali&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Xing Yang (&lt;a href=&#34;https://github.com/xing-yang&#34;&gt;xing-yang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Yati Padia (&lt;a href=&#34;https://github.com/yati1998&#34;&gt;yati1998&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For those interested in getting involved with the design and development of CSI or
any part of the Kubernetes Storage system, join the
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;Kubernetes Storage Special Interest Group&lt;/a&gt; (SIG).
We always welcome new contributors.&lt;/p&gt;
&lt;p&gt;We also hold regular &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-data-protection&#34;&gt;Data Protection Working Group meetings&lt;/a&gt;.
New attendees are welcome to join our discussions.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Enhancing Kubernetes API Server Efficiency with API Streaming</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/17/kube-apiserver-api-streaming/</link>
      <pubDate>Tue, 17 Dec 2024 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/17/kube-apiserver-api-streaming/</guid>
      <description>
        
        
        &lt;p&gt;Managing Kubernetes clusters efficiently is critical, especially as their size is growing.
A significant challenge with large clusters is the memory overhead caused by &lt;strong&gt;list&lt;/strong&gt; requests.&lt;/p&gt;
&lt;p&gt;In the existing implementation, the kube-apiserver processes &lt;strong&gt;list&lt;/strong&gt; requests by assembling the entire response in-memory before transmitting any data to the client.
But what if the response body is substantial, say hundreds of megabytes? Additionally, imagine a scenario where multiple &lt;strong&gt;list&lt;/strong&gt; requests flood in simultaneously, perhaps after a brief network outage.
While &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/cluster-administration/flow-control/&#34;&gt;API Priority and Fairness&lt;/a&gt; has proven to reasonably protect kube-apiserver from CPU overload, its impact is visibly smaller for memory protection.
This can be explained by the differing nature of resource consumption by a single API request - the CPU usage at any given time is capped by a constant, whereas memory, being uncompressible, can grow proportionally with the number of processed objects and is unbounded.
This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to out-of-memory (OOM) conditions. To better visualize the issue, let&#39;s consider the below graph.&lt;/p&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/17/kube-apiserver-api-streaming/kube-apiserver-memory_usage.png&#34;
         alt=&#34;Monitoring graph showing kube-apiserver memory usage&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;The graph shows the memory usage of a kube-apiserver during a synthetic test.
(see the &lt;a href=&#34;#the-synthetic-test&#34;&gt;synthetic test&lt;/a&gt; section for more details).
The results clearly show that increasing the number of informers significantly boosts the server&#39;s memory consumption.
Notably, at approximately 16:40, the server crashed when serving only 16 informers.&lt;/p&gt;
&lt;h2 id=&#34;why-does-kube-apiserver-allocate-so-much-memory-for-list-requests&#34;&gt;Why does kube-apiserver allocate so much memory for list requests?&lt;/h2&gt;
&lt;p&gt;Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;fetch data from the database,&lt;/li&gt;
&lt;li&gt;deserialize the data from its stored format,&lt;/li&gt;
&lt;li&gt;and finally construct the final response by converting and serializing the data into a client requested format&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This sequence results in significant temporary memory consumption.
The actual usage depends on many factors like the page size, applied filters (e.g. label selectors), query parameters, and sizes of individual objects.&lt;/p&gt;
&lt;p&gt;Unfortunately, neither &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/cluster-administration/flow-control/&#34;&gt;API Priority and Fairness&lt;/a&gt; nor Golang&#39;s garbage collection or Golang memory limits can prevent the system from exhausting memory under these conditions.
The memory is allocated suddenly and rapidly, and just a few requests can quickly deplete the available memory, leading to resource exhaustion.&lt;/p&gt;
&lt;p&gt;Depending on how the API server is run on the node, it might either be killed through OOM by the kernel when exceeding the configured memory limits during these uncontrolled spikes, or if limits are not configured it might have even worse impact on the control plane node.
And worst, after the first API server failure, the same requests will likely hit another control plane node in an HA setup with probably the same impact.
Potentially a situation that is hard to diagnose and hard to recover from.&lt;/p&gt;
&lt;h2 id=&#34;streaming-list-requests&#34;&gt;Streaming list requests&lt;/h2&gt;
&lt;p&gt;Today, we&#39;re excited to announce a major improvement.
With the graduation of the &lt;em&gt;watch list&lt;/em&gt; feature to beta in Kubernetes 1.32, client-go users can opt-in (after explicitly enabling &lt;code&gt;WatchListClient&lt;/code&gt; feature gate)
to streaming lists by switching from &lt;strong&gt;list&lt;/strong&gt; to (a special kind of) &lt;strong&gt;watch&lt;/strong&gt; requests.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Watch&lt;/strong&gt; requests are served from the &lt;em&gt;watch cache&lt;/em&gt;, an in-memory cache designed to improve scalability of read operations.
By streaming each item individually instead of returning the entire collection, the new method maintains constant memory overhead.
The API server is bound by the maximum allowed size of an object in etcd plus a few additional allocations.
This approach drastically reduces the temporary memory usage compared to traditional &lt;strong&gt;list&lt;/strong&gt; requests, ensuring a more efficient and stable system,
especially in clusters with a large number of objects of a given type or large average object sizes where despite paging memory consumption used to be high.&lt;/p&gt;
&lt;p&gt;Building on the insight gained from the synthetic test (see the &lt;a href=&#34;#the-synthetic-test&#34;&gt;synthetic test&lt;/a&gt;, we developed an automated performance test to systematically evaluate the impact of the &lt;em&gt;watch list&lt;/em&gt; feature.
This test replicates the same scenario, generating a large number of Secrets with a large payload, and scaling the number of informers to simulate heavy &lt;strong&gt;list&lt;/strong&gt; request patterns.
The automated test is executed periodically to monitor memory usage of the server with the feature enabled and disabled.&lt;/p&gt;
&lt;p&gt;The results showed significant improvements with the &lt;em&gt;watch list&lt;/em&gt; feature enabled.
With the feature turned on, the kube-apiserver’s memory consumption stabilized at approximately &lt;strong&gt;2 GB&lt;/strong&gt;.
By contrast, with the feature disabled, memory usage increased to approximately &lt;strong&gt;20GB&lt;/strong&gt;, a &lt;strong&gt;10x&lt;/strong&gt; increase!
These results confirm the effectiveness of the new streaming API, which reduces the temporary memory footprint.&lt;/p&gt;
&lt;h2 id=&#34;enabling-api-streaming-for-your-component&#34;&gt;Enabling API Streaming for your component&lt;/h2&gt;
&lt;p&gt;Upgrade to Kubernetes 1.32. Make sure your cluster uses etcd in version 3.4.31+ or 3.5.13+.
Change your client software to use watch lists. If your client code is written in Golang, you&#39;ll want to enable &lt;code&gt;WatchListClient&lt;/code&gt; for client-go.
For details on enabling that feature, read &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/08/12/feature-gates-in-client-go&#34;&gt;Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;what-s-next&#34;&gt;What&#39;s next?&lt;/h2&gt;
&lt;p&gt;In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state.
This will eventually be expanded to other core components like kube-scheduler or kubelet; once the feature becomes generally available, if not earlier.
Other 3rd-party components are encouraged to opt-in to the feature during the beta phase, especially when they are at risk of accessing a large number of resources or kinds with potentially large object sizes.&lt;/p&gt;
&lt;p&gt;For the time being, &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/cluster-administration/flow-control/&#34;&gt;API Priority and Fairness&lt;/a&gt; assigns a reasonable small cost to &lt;strong&gt;list&lt;/strong&gt; requests.
This is necessary to allow enough parallelism for the average case where &lt;strong&gt;list&lt;/strong&gt; requests are cheap enough.
But it does not match the spiky exceptional situation of many and large objects.
Once the majority of the Kubernetes ecosystem has switched to &lt;em&gt;watch list&lt;/em&gt;, the &lt;strong&gt;list&lt;/strong&gt; cost estimation can be changed to larger values without risking degraded performance in the average case,
and with that increasing the protection against this kind of requests that can still hit the API server in the future.&lt;/p&gt;
&lt;h2 id=&#34;the-synthetic-test&#34;&gt;The synthetic test&lt;/h2&gt;
&lt;p&gt;In order to reproduce the issue, we conducted a manual test to understand the impact of &lt;strong&gt;list&lt;/strong&gt; requests on kube-apiserver memory usage.
In the test, we created 400 Secrets, each containing 1 MB of data, and used informers to retrieve all Secrets.&lt;/p&gt;
&lt;p&gt;The results were alarming, only 16 informers were needed to cause the test server to run out of memory and crash, demonstrating how quickly memory consumption can grow under such conditions.&lt;/p&gt;
&lt;p&gt;Special shout out to &lt;a href=&#34;https://github.com/deads2k&#34;&gt;@deads2k&lt;/a&gt; for his help in shaping this feature.&lt;/p&gt;
&lt;h2 id=&#34;kubernetes-1-33-update&#34;&gt;Kubernetes 1.33 update&lt;/h2&gt;
&lt;p&gt;Since this feature was started, &lt;a href=&#34;https://github.com/serathius&#34;&gt;Marek Siarkowicz&lt;/a&gt; integrated a new technology into the
Kubernetes API server: &lt;em&gt;streaming collection encoding&lt;/em&gt;.
Kubernetes v1.33 introduced two related feature gates, &lt;code&gt;StreamingCollectionEncodingToJSON&lt;/code&gt; and &lt;code&gt;StreamingCollectionEncodingToProtobuf&lt;/code&gt;.
These features encode via a stream and avoid allocating all the memory at once.
This functionality is bit-for-bit compatible with existing &lt;strong&gt;list&lt;/strong&gt; encodings, produces even greater server-side memory savings, and doesn&#39;t require any changes to client code.
In 1.33, the &lt;code&gt;WatchList&lt;/code&gt; feature gate is disabled by default.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.32 Adds A New CPU Manager Static Policy Option For Strict CPU Reservation</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/16/cpumanager-strict-cpu-reservation/</link>
      <pubDate>Mon, 16 Dec 2024 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/16/cpumanager-strict-cpu-reservation/</guid>
      <description>
        
        
        &lt;p&gt;In Kubernetes v1.32, after years of community discussion, we are excited to introduce a
&lt;code&gt;strict-cpu-reservation&lt;/code&gt; option for the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/administer-cluster/cpu-management-policies/#static-policy-options&#34;&gt;CPU Manager static policy&lt;/a&gt;.
This feature is currently in alpha, with the associated policy hidden by default. You can only use the
policy if you explicitly enable the alpha behavior in your cluster.&lt;/p&gt;
&lt;h2 id=&#34;understanding-the-feature&#34;&gt;Understanding the feature&lt;/h2&gt;
&lt;p&gt;The CPU Manager static policy is used to reduce latency or improve performance. The &lt;code&gt;reservedSystemCPUs&lt;/code&gt; defines an explicit CPU set for OS system daemons and kubernetes system daemons. This option is designed for Telco/NFV type use cases where uncontrolled interrupts/timers may impact the workload performance. you can use this option to define the explicit cpuset for the system/kubernetes daemons as well as the interrupts/timers, so the rest CPUs on the system can be used exclusively for workloads, with less impact from uncontrolled interrupts/timers. More details of this parameter can be found on the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list&#34;&gt;Explicitly Reserved CPU List&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;If you want to protect your system daemons and interrupt processing, the obvious way is to use the &lt;code&gt;reservedSystemCPUs&lt;/code&gt; option.&lt;/p&gt;
&lt;p&gt;However, until the Kubernetes v1.32 release, this isolation was only implemented for guaranteed
pods that made requests for a whole number of CPUs. At pod admission time, the kubelet only
compares the CPU &lt;em&gt;requests&lt;/em&gt; against the allocatable CPUs. In Kubernetes, limits can be higher than
the requests; the previous implementation allowed burstable and best-effort pods to use up
the capacity of &lt;code&gt;reservedSystemCPUs&lt;/code&gt;, which could then starve host OS services of CPU - and we
know that people saw this in real life deployments.
The existing behavior also made benchmarking (for both infrastructure and workloads) results inaccurate.&lt;/p&gt;
&lt;p&gt;When this new &lt;code&gt;strict-cpu-reservation&lt;/code&gt; policy option is enabled, the CPU Manager static policy will not allow any workload to use the reserved system CPU cores.&lt;/p&gt;
&lt;h2 id=&#34;enabling-the-feature&#34;&gt;Enabling the feature&lt;/h2&gt;
&lt;p&gt;To enable this feature, you need to turn on both the &lt;code&gt;CPUManagerPolicyAlphaOptions&lt;/code&gt; feature gate and the &lt;code&gt;strict-cpu-reservation&lt;/code&gt; policy option. And you need to remove the &lt;code&gt;/var/lib/kubelet/cpu_manager_state&lt;/code&gt; file if it exists and restart kubelet.&lt;/p&gt;
&lt;p&gt;With the following kubelet configuration:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;KubeletConfiguration&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kubelet.config.k8s.io/v1beta1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;featureGates&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;CPUManagerPolicyOptions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;CPUManagerPolicyAlphaOptions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cpuManagerPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;static&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cpuManagerPolicyOptions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;strict-cpu-reservation&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;true&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;reservedSystemCPUs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0,32,1,33,16,48&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;...&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When &lt;code&gt;strict-cpu-reservation&lt;/code&gt; is not set or set to false:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;#&lt;/span&gt; cat /var/lib/kubelet/cpu_manager_state
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;{&amp;#34;policyName&amp;#34;:&amp;#34;static&amp;#34;,&amp;#34;defaultCpuSet&amp;#34;:&amp;#34;0-63&amp;#34;,&amp;#34;checksum&amp;#34;:1058907510}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When &lt;code&gt;strict-cpu-reservation&lt;/code&gt; is set to true:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;#&lt;/span&gt; cat /var/lib/kubelet/cpu_manager_state
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;{&amp;#34;policyName&amp;#34;:&amp;#34;static&amp;#34;,&amp;#34;defaultCpuSet&amp;#34;:&amp;#34;2-15,17-31,34-47,49-63&amp;#34;,&amp;#34;checksum&amp;#34;:4141502832}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;monitoring-the-feature&#34;&gt;Monitoring the feature&lt;/h2&gt;
&lt;p&gt;You can monitor the feature impact by checking the following CPU Manager counters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cpu_manager_shared_pool_size_millicores&lt;/code&gt;: report shared pool size, in millicores (e.g. 13500m)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cpu_manager_exclusive_cpu_allocation_count&lt;/code&gt;: report exclusively allocated cores, counting full cores (e.g. 16)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Your best-effort workloads may starve if the &lt;code&gt;cpu_manager_shared_pool_size_millicores&lt;/code&gt; count is zero for prolonged time.&lt;/p&gt;
&lt;p&gt;We believe any pod that is required for operational purpose like a log forwarder should not run as best-effort, but you can review and adjust the amount of CPU cores reserved as needed.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Strict CPU reservation is critical for Telco/NFV use cases. It is also a prerequisite for enabling the all-in-one type of deployments where workloads are placed on nodes serving combined control+worker+storage roles.&lt;/p&gt;
&lt;p&gt;We want you to start using the feature and looking forward to your feedback.&lt;/p&gt;
&lt;h2 id=&#34;further-reading&#34;&gt;Further reading&lt;/h2&gt;
&lt;p&gt;Please check out the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/administer-cluster/cpu-management-policies/&#34;&gt;Control CPU Management Policies on the Node&lt;/a&gt;
task page to learn more about the CPU Manager, and how it fits in relation to the other node-level resource managers.&lt;/p&gt;
&lt;h2 id=&#34;getting-involved&#34;&gt;Getting involved&lt;/h2&gt;
&lt;p&gt;This feature is driven by the &lt;a href=&#34;https://github.com/Kubernetes/community/blob/master/sig-node/README.md&#34;&gt;SIG Node&lt;/a&gt;. If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please attend the SIG Node meeting for more details.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.32: Memory Manager Goes GA</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/13/memory-manager-goes-ga/</link>
      <pubDate>Fri, 13 Dec 2024 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/13/memory-manager-goes-ga/</guid>
      <description>
        
        
        &lt;p&gt;With Kubernetes 1.32, the memory manager has officially graduated to General Availability (GA),
marking a significant milestone in the journey toward efficient and predictable memory allocation for containerized applications.
Since Kubernetes v1.22, where it graduated to beta, the memory manager has proved itself reliable, stable and a good complementary feature for the
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/tasks/administer-cluster/cpu-management-policies/&#34;&gt;CPU Manager&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As part of kubelet&#39;s workload admission process,
the memory manager provides topology hints
to optimize memory allocation and alignment.
This enables users to allocate exclusive
memory for Pods in the &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/pods/pod-qos/#guaranteed&#34;&gt;Guaranteed&lt;/a&gt; QoS class.
More details about the process can be found in the memory manager goes to beta &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2021/08/11/kubernetes-1-22-feature-memory-manager-moves-to-beta/&#34;&gt;blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Most of the changes introduced since the Beta are bug fixes, internal refactoring and
observability improvements, such as metrics and better logging.&lt;/p&gt;
&lt;h2 id=&#34;observability-improvements&#34;&gt;Observability improvements&lt;/h2&gt;
&lt;p&gt;As part of the effort
to increase the observability of memory manager, new metrics have been added
to provide some statistics on memory allocation patterns.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;memory_manager_pinning_requests_total&lt;/strong&gt; -
tracks the number of times the pod spec required the memory manager to pin memory pages.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;memory_manager_pinning_errors_total&lt;/strong&gt; -
tracks the number of times the pod spec required the memory manager
to pin memory pages, but the allocation failed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;improving-memory-manager-reliability-and-consistency&#34;&gt;Improving memory manager reliability and consistency&lt;/h2&gt;
&lt;p&gt;The kubelet does not guarantee pod ordering
when admitting pods after a restart or reboot.&lt;/p&gt;
&lt;p&gt;In certain edge cases, this behavior could cause
the memory manager to reject some pods,
and in more extreme cases, it may cause kubelet to fail upon restart.&lt;/p&gt;
&lt;p&gt;Previously, the beta implementation lacked certain checks and logic to prevent
these issues.&lt;/p&gt;
&lt;p&gt;To stabilize the memory manager for general availability (GA) readiness,
small but critical refinements have been
made to the algorithm, improving its robustness and handling of edge cases.&lt;/p&gt;
&lt;h2 id=&#34;future-development&#34;&gt;Future development&lt;/h2&gt;
&lt;p&gt;There is more to come for the future of Topology Manager in general,
and memory manager in particular.
Notably, ongoing efforts are underway
to extend &lt;a href=&#34;https://github.com/kubernetes/kubernetes/pull/128560&#34;&gt;memory manager support to Windows&lt;/a&gt;,
enabling CPU and memory affinity on a Windows operating system.&lt;/p&gt;
&lt;h2 id=&#34;getting-involved&#34;&gt;Getting involved&lt;/h2&gt;
&lt;p&gt;This feature is driven by the &lt;a href=&#34;https://github.com/Kubernetes/community/blob/master/sig-node/README.md&#34;&gt;SIG Node&lt;/a&gt; community.
Please join us to connect with the community
and share your ideas and feedback around the above feature and
beyond.
We look forward to hearing from you!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.32: QueueingHint Brings a New Possibility to Optimize Pod Scheduling</title>
      <link>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/12/scheduler-queueinghint/</link>
      <pubDate>Thu, 12 Dec 2024 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/12/scheduler-queueinghint/</guid>
      <description>
        
        
        &lt;p&gt;The Kubernetes &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/kube-scheduler/&#34;&gt;scheduler&lt;/a&gt; is the core
component that selects the nodes on which new Pods run. The scheduler processes
these new Pods &lt;strong&gt;one by one&lt;/strong&gt;. Therefore, the larger your clusters, the more important
the throughput of the scheduler becomes.&lt;/p&gt;
&lt;p&gt;Over the years, Kubernetes SIG Scheduling has improved the throughput
of the scheduler in multiple enhancements. This blog post describes a major improvement to the
scheduler in Kubernetes v1.32: a
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/scheduling-framework/#extension-points&#34;&gt;scheduling context element&lt;/a&gt;
named &lt;em&gt;QueueingHint&lt;/em&gt;. This page provides background knowledge of the scheduler and explains how
QueueingHint improves scheduling throughput.&lt;/p&gt;
&lt;h2 id=&#34;scheduling-queue&#34;&gt;Scheduling queue&lt;/h2&gt;
&lt;p&gt;The scheduler stores all unscheduled Pods in an internal component called the &lt;em&gt;scheduling queue&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The scheduling queue consists of the following data structures:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;ActiveQ&lt;/strong&gt;: holds newly created Pods or Pods that are ready to be retried for scheduling.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BackoffQ&lt;/strong&gt;: holds Pods that are ready to be retried but are waiting for a backoff period to end. The
backoff period depends on the number of unsuccessful scheduling attempts performed by the scheduler on that Pod.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unschedulable Pod Pool&lt;/strong&gt;: holds Pods that the scheduler won&#39;t attempt to schedule for one of the
following reasons:
&lt;ul&gt;
&lt;li&gt;The scheduler previously attempted and was unable to schedule the Pods. Since that attempt, the cluster
hasn&#39;t changed in a way that could make those Pods schedulable.&lt;/li&gt;
&lt;li&gt;The Pods are blocked from entering the scheduling cycles by PreEnqueue Plugins,
for example, they have a &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/pod-scheduling-readiness/#configuring-pod-schedulinggates&#34;&gt;scheduling gate&lt;/a&gt;,
and get blocked by the scheduling gate plugin.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;scheduling-framework-and-plugins&#34;&gt;Scheduling framework and plugins&lt;/h2&gt;
&lt;p&gt;The Kubernetes scheduler is implemented following the Kubernetes
&lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/scheduling-framework/&#34;&gt;scheduling framework&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And, all scheduling features are implemented as plugins
(e.g., &lt;a href=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity&#34;&gt;Pod affinity&lt;/a&gt;
is implemented in the &lt;code&gt;InterPodAffinity&lt;/code&gt; plugin.)&lt;/p&gt;
&lt;p&gt;The scheduler processes pending Pods in phases called &lt;em&gt;cycles&lt;/em&gt; as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scheduling cycle&lt;/strong&gt;: the scheduler takes pending Pods from the activeQ component of the scheduling
queue  &lt;em&gt;one by one&lt;/em&gt;. For each Pod, the scheduler runs the filtering/scoring logic from every scheduling plugin. The
scheduler then decides on the best node for the Pod, or decides that the Pod can&#39;t be scheduled at that time.&lt;/p&gt;
&lt;p&gt;If the scheduler decides that a Pod can&#39;t be scheduled, that Pod enters the Unschedulable Pod Pool
component of the scheduling queue. However, if the scheduler decides to place the Pod on a node,
the Pod goes to the binding cycle.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Binding cycle&lt;/strong&gt;: the scheduler communicates the node placement decision to the Kubernetes API
server. This operation bounds the Pod to the selected node.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Aside from some exceptions, most unscheduled Pods enter the unschedulable pod pool after each scheduling
cycle. The Unschedulable Pod Pool component is crucial because of how the scheduling cycle processes Pods one by one. If the scheduler had to constantly retry placing unschedulable Pods, instead of offloading those
Pods to the Unschedulable Pod Pool, multiple scheduling cycles would be wasted on those Pods.&lt;/p&gt;
&lt;h2 id=&#34;improvements-to-retrying-pod-scheduling-with-queuinghint&#34;&gt;Improvements to retrying Pod scheduling with QueuingHint&lt;/h2&gt;
&lt;p&gt;Unschedulable Pods only move back into the ActiveQ or BackoffQ components of the scheduling
queue if changes in the cluster might allow the scheduler to place those Pods on nodes.&lt;/p&gt;
&lt;p&gt;Prior to v1.32, each plugin registered which cluster changes could solve their failures, an object creation, update, or deletion in the cluster (called &lt;em&gt;cluster events&lt;/em&gt;),
with &lt;code&gt;EnqueueExtensions&lt;/code&gt; (&lt;code&gt;EventsToRegister&lt;/code&gt;),
and the scheduling queue retries a pod with an event that is registered by a plugin that rejected the pod in a previous scheduling cycle.&lt;/p&gt;
&lt;p&gt;Additionally, we had an internal feature called &lt;code&gt;preCheck&lt;/code&gt;, which helped further filtering of events for efficiency, based on Kubernetes core scheduling constraints;
For example, &lt;code&gt;preCheck&lt;/code&gt; could filter out node-related events when the node status is &lt;code&gt;NotReady&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;However, we had two issues for those approaches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requeueing with events was too broad, could lead to scheduling retries for no reason.
&lt;ul&gt;
&lt;li&gt;A new scheduled Pod &lt;em&gt;might&lt;/em&gt; solve the &lt;code&gt;InterPodAffinity&lt;/code&gt;&#39;s failure, but not all of them do.
For example, if a new Pod is created, but without a label matching &lt;code&gt;InterPodAffinity&lt;/code&gt; of the unschedulable pod, the pod wouldn&#39;t be schedulable.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;preCheck&lt;/code&gt; relied on the logic of in-tree plugins and was not extensible to custom plugins,
like in issue &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/110175&#34;&gt;#110175&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here QueueingHints come into play;
a QueueingHint subscribes to a particular kind of cluster event, and make a decision about whether each incoming event could make the Pod schedulable.&lt;/p&gt;
&lt;p&gt;For example, consider a Pod named &lt;code&gt;pod-a&lt;/code&gt; that has a required Pod affinity. &lt;code&gt;pod-a&lt;/code&gt; was rejected in
the scheduling cycle by the &lt;code&gt;InterPodAffinity&lt;/code&gt; plugin because no node had an existing Pod that matched
the Pod affinity specification for &lt;code&gt;pod-a&lt;/code&gt;.&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/12/scheduler-queueinghint/queueinghint1.svg&#34;
         alt=&#34;A diagram showing the scheduling queue and pod-a rejected by InterPodAffinity plugin&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;A diagram showing the scheduling queue and pod-a rejected by InterPodAffinity plugin&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;code&gt;pod-a&lt;/code&gt; moves into the Unschedulable Pod Pool. The scheduling queue records which plugin caused
the scheduling failure for the Pod. For &lt;code&gt;pod-a&lt;/code&gt;, the scheduling queue records that the &lt;code&gt;InterPodAffinity&lt;/code&gt;
plugin rejected the Pod.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pod-a&lt;/code&gt; will never be schedulable until the InterPodAffinity failure is resolved.
There&#39;re some scenarios that the failure could be resolved, one example is an existing running pod gets a label update and becomes matching a Pod affinity.
For this scenario, the &lt;code&gt;InterPodAffinity&lt;/code&gt; plugin&#39;s &lt;code&gt;QueuingHint&lt;/code&gt; callback function checks every Pod label update that occurs in the cluster.
Then, if a Pod gets a label update that matches the Pod affinity requirement of &lt;code&gt;pod-a&lt;/code&gt;, the &lt;code&gt;InterPodAffinity&lt;/code&gt;,
plugin&#39;s &lt;code&gt;QueuingHint&lt;/code&gt; prompts the scheduling queue to move &lt;code&gt;pod-a&lt;/code&gt; back into the ActiveQ or
the BackoffQ component.&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-55276--kubernetes-io-main-staging.netlify.app/blog/2024/12/12/scheduler-queueinghint/queueinghint2.svg&#34;
         alt=&#34;A diagram showing the scheduling queue and pod-a being moved by InterPodAffinity QueueingHint&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;A diagram showing the scheduling queue and pod-a being moved by InterPodAffinity QueueingHint&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id=&#34;queueinghint-s-history-and-what-s-new-in-v1-32&#34;&gt;QueueingHint&#39;s history and what&#39;s new in v1.32&lt;/h2&gt;
&lt;p&gt;At SIG Scheduling, we have been working on the development of QueueingHint since
Kubernetes v1.28.&lt;/p&gt;
&lt;p&gt;While QueuingHint isn&#39;t user-facing, we implemented the &lt;code&gt;SchedulerQueueingHints&lt;/code&gt; feature gate as a
safety measure when we originally added this feature. In v1.28, we implemented QueueingHints with a
few in-tree plugins experimentally, and made the feature gate enabled by default.&lt;/p&gt;
&lt;p&gt;However, users reported a memory leak, and consequently we disabled the feature gate in a
patch release of v1.28.  From v1.28 until v1.31, we kept working on the QueueingHint implementation
within the rest of the in-tree plugins and fixing bugs.&lt;/p&gt;
&lt;p&gt;In v1.32, we made this feature enabled by default again. We finished implementing QueueingHints
in all plugins and also identified the cause of the memory leak!&lt;/p&gt;
&lt;p&gt;We thank all the contributors who participated in the development of this feature and those who reported and investigated the earlier issues.&lt;/p&gt;
&lt;h2 id=&#34;getting-involved&#34;&gt;Getting involved&lt;/h2&gt;
&lt;p&gt;These features are managed by Kubernetes &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-scheduling&#34;&gt;SIG Scheduling&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please join us and share your feedback.&lt;/p&gt;
&lt;h2 id=&#34;how-can-i-learn-more&#34;&gt;How can I learn more?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/4247-queueinghint/README.md&#34;&gt;KEP-4247: Per-plugin callback functions for efficient requeueing in the scheduling queue&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
  </channel>
</rss>
