docs(day3): etcd talk
Some checks failed
Build latest image / build-container (push) Failing after 36s

This commit is contained in:
Nicolai Ort 2025-04-04 15:08:02 +02:00
parent 44a3653c84
commit 957bc94344

View File

@ -0,0 +1,43 @@
---
title: "Don't let your kubernetes cluster go wild: Ensuring etcd reliability"
weight: 3
tags:
- kubecon
- etcd
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Fair warning: This talk was very technical and pretty interesing - but don't even try to understand it if you're tired (or if it's the thrid to last session on the last day of a long conference).
## Baseline
- Standard example: Write and read KV-Data, `put(A,2) -> Get (A)`
- Problem: Concurrency
TODO: Steal image from intuition of correctness
## Correctness
- Correctness: Kinda funky when it comes to time
- Fix: Define serialization that executes parallel request one after another to bring them in an order
## Failures
- What happens is connections between etcd nodes go down -> Serving stale data
- What happens if data corrupts -> If enough members are online, it can repair itself
- And many more that can happen at random times -> Hard to test
TODO: Steal "in a concurrent world"
## Robustness framework
- Automates tests for failures
- Includes reliable reproductions of past (seamingly random) errors
- Currently a mixture of existing go debugging tools
## Future
- Reproduce more bugs consistently
- Run additional consistency checks