diff --git a/content/day3/03_etcd-reliability.md b/content/day3/03_etcd-reliability.md new file mode 100644 index 0000000..baa5d12 --- /dev/null +++ b/content/day3/03_etcd-reliability.md @@ -0,0 +1,43 @@ +--- +title: "Don't let your kubernetes cluster go wild: Ensuring etcd reliability" +weight: 3 +tags: + - kubecon + - etcd +--- + + + + +Fair warning: This talk was very technical and pretty interesing - but don't even try to understand it if you're tired (or if it's the thrid to last session on the last day of a long conference). + +## Baseline + +- Standard example: Write and read KV-Data, `put(A,2) -> Get (A)` +- Problem: Concurrency + +TODO: Steal image from intuition of correctness + +## Correctness + +- Correctness: Kinda funky when it comes to time +- Fix: Define serialization that executes parallel request one after another to bring them in an order + +## Failures + +- What happens is connections between etcd nodes go down -> Serving stale data +- What happens if data corrupts -> If enough members are online, it can repair itself +- And many more that can happen at random times -> Hard to test + +TODO: Steal "in a concurrent world" + +## Robustness framework + +- Automates tests for failures +- Includes reliable reproductions of past (seamingly random) errors +- Currently a mixture of existing go debugging tools + +## Future + +- Reproduce more bugs consistently +- Run additional consistency checks \ No newline at end of file