---
title: "Perfomance preseverance: Taming 1000 kubernetes clusters"
weight: 12
tags:
 - platform
 - cloudnativecon
---

{{% button href="https://youtu.be/ZTT8M74RD1M" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/d5/kubecon_2025_v4.2.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}

## History

- They started with upstream kubernetes - the hard way
- Env grew to over 200 prod apps
- Pains: Single Cluster, single point of failure and complexity
- What worked: Dev adoption and autonomy, no vendor

## Challenges

> Based on stakeholder expectations

- One tenant per cluster -> Over 1000 Clusters
- Release management
- Small team (3 Engineers)

## Guiding principles

- Platform as a product
- Stability: trust
- Standardization -> Scalability and inter team collab
- Day 2 support
- Dogfooding

## Tenancy

- One cluster per product
- Own CLI, devs like cli
- Custom operator and crds

## Stack

- Keopsctl? Pretty much their own cluster operator
- A Simple Cluster CRD

## Migration

1. Build trust in platform
2. Support with docs, oboarding, q&a
3. Co-create with devs while keeping an eye on day2 -> Feature-Flag based rollout