From 6931da118cbe1d8d2572fb39e5d95951c0384246 Mon Sep 17 00:00:00 2001 From: Nicolai Ort Date: Sun, 30 Mar 2025 15:35:08 +0200 Subject: [PATCH] docs(day-2): New talk --- content/day-2/07_pushing-limits.md | 80 ++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 content/day-2/07_pushing-limits.md diff --git a/content/day-2/07_pushing-limits.md b/content/day-2/07_pushing-limits.md new file mode 100644 index 0000000..3a90203 --- /dev/null +++ b/content/day-2/07_pushing-limits.md @@ -0,0 +1,80 @@ +--- +title: "From us to ms: Pushing Kubernetes Workloads to the Limit" +weight: 7 +tags: + - rejekts + - performance +--- + + + +There were more details in the talk than I copied into these notes. +Most of them were just too much to write down or application specific. + +## Why? + +- We need it (Product requirements) +- Cost efficiency + +## Cross Provider Networking + +- Throughput: + - Same-Zone 200GB/s + - Cross-Zone 5-10% Pemnalty +- Latency: + - Same Zone P99: 0.95ms + - Cross zone P99: 1.95ms +- Result: Encourage Services to allways router in the same zone if possible +- How: + - Topology-Aware-Routing (older, a bit buggy) + - `trafficDistribution: PreferClose`: Routes to same zone if possible (needs cni-support) + - Setup the stack one in each zone +- Measurements: Kubezonnet can detect cross-zone-traffic + +## Disk latency + +- Baseline 660MiB/s per SSD aka ~1 SSD per 5GBit/s Networking +- Example: 100Gbps needs a RAID0 with a bunch of SSDs + +```mermaid +graph LR + Querier-->|125ms|Cache + Cache-->|200ms|S3 + direction TB + Cache<-->SSD +``` + +## Memory managment + +- Garbage Collection takes time and is a throughput for latency trade-off +- Idea: Avoid allocations + - Preallocate (e.g. Arenas) + - Allocation reuse (e.g. in grpc) + - "Allocation Schemes" (thread per core) +- Avoid memory pressure by + - Using gc-friendly types + - Tuning your GC +- Idea: Implement your own optimized data structure + +## Optimization in Kubernates + +### Defaults + +- Best efford +- No protection from consuming all node memory +- Critical services could get scheduled on the same node + +### Requests and limits + +- Requests: Needed to be scheduled +- Limits: Kill if exceeded +- Problem: Reactive, it just checks pods according to a cronjob (can be set as apiflag but has a minimum) +- Downward-API: You can reference the limits in your applications (to let the app trigger gc before the pod gets killed) + +### Tains and tolerations + +- Pin your workload basted on labels and annotations + +### Static cpu manager + +- Request a whole number of CPUs -> You get this core guranteed \ No newline at end of file