All checks were successful
Build latest image / build-container (push) Successful in 53s
60 lines
2.1 KiB
Markdown
60 lines
2.1 KiB
Markdown
---
|
|
title: "Scaling PDBs: Introducing Multi-Cluster Resilience with x-pdb"
|
|
weight: 6
|
|
tags:
|
|
- rejekts
|
|
- multicluster
|
|
---
|
|
|
|
{{% button href="https://www.youtube.com/watch?v=w8rDxtrMGG8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
|
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
|
|
|
## Baseline Infra
|
|
|
|
- Multiple Clusters across cloud providers
|
|
- Cilium with Clustermesh
|
|
- Stretched CockroachDB and NATS
|
|
|
|
TODO: Steal overview from slides
|
|
|
|
## PDBs and limits
|
|
|
|
- PDB: Classic core component that requires a number of pods with successfull readyness probes per deployment
|
|
- Eviction: Can be stopped by a PDB what has not reached the minimum available
|
|
- Interruptions: Voluntary (New image, updated specs, ...) vs involuntary (Eviction, deletion, node pressule, NoExecute, Node deletion)
|
|
|
|
## Stateful across multiple clusters
|
|
|
|
- Baseline: PDBs only know about one cluster
|
|
- Problem: If the master pod fails (or get's evicted) on 2/3 clusters
|
|
- Factors: Movement, Maintainance, Chaos-Experiments, Secret rotation
|
|
- Workaround: Just manually check all systems before doing anything
|
|
- Idea: Multi-Cluster PDB
|
|
- Solution: A new hook on the eviciton api that interacts with a new Cluster-Aware CRD
|
|
|
|
## How it actually works
|
|
|
|
1. Drain API get's called
|
|
2. Check replicas accross cluster
|
|
3. Anwer based on current state
|
|
|
|
Actually: There is a lease-mechanism to prevent race conditions across clusters
|
|
|
|
TODO: Steal diagram from slides
|
|
|
|
## What works
|
|
|
|
- Voluntary: 100% supported
|
|
- Involuntary: Yes they hooked into most of the deletion api calls (eviction, pressure, kubectl delete, admissions, node deletion)
|
|
|
|
## Demo
|
|
|
|
Pretty interesting, watch the video to find out
|
|
|
|
|
|
## Q&A
|
|
|
|
- Do you need a flat network: No just expose the tcp lb
|
|
- Did you think about using etcd to implement the leases instead of objects: They use managed hostplanes and dont want another etcd
|
|
- Have you tried to commit upstream: Nope, pretty much not an option thanks to the managed control-plane not being able to set apropriate flags
|