--- title: "Perfomance preseverance: Taming 1000 kubernetes clusters" weight: 12 tags: - platform - cloudnativecon --- {{% button href="https://youtu.be/ZTT8M74RD1M" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} {{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/d5/kubecon_2025_v4.2.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} ## History - They started with upstream kubernetes - the hard way - Env grew to over 200 prod apps - Pains: Single Cluster, single point of failure and complexity - What worked: Dev adoption and autonomy, no vendor ## Challenges > Based on stakeholder expectations - One tenant per cluster -> Over 1000 Clusters - Release management - Small team (3 Engineers) ## Guiding principles - Platform as a product - Stability: trust - Standardization -> Scalability and inter team collab - Day 2 support - Dogfooding ## Tenancy - One cluster per product - Own CLI, devs like cli - Custom operator and crds ## Stack - Keopsctl? Pretty much their own cluster operator - A Simple Cluster CRD ## Migration 1. Build trust in platform 2. Support with docs, oboarding, q&a 3. Co-create with devs while keeping an eye on day2 -> Feature-Flag based rollout