kubecon25/content/day-2/07_pushing-limits.md

---
title: "From us to ms: Pushing Kubernetes Workloads to the Limit"
weight: 7
tags:
 - rejekts
 - performance
---

{{% button href="https://www.youtube.com/watch?v=EYipC5y-8rM" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->

There were more details in the talk than I copied into these notes.
Most of them were just too much to write down or application specific.

## Why?

- We need it (Product requirements)
- Cost efficiency

## Cross Provider Networking

- Throughput:
    - Same-Zone 200GB/s
    - Cross-Zone 5-10% Pemnalty
- Latency:
    - Same Zone P99: 0.95ms
    - Cross zone P99: 1.95ms
- Result: Encourage Services to allways router in the same zone if possible
- How:
    - Topology-Aware-Routing (older, a bit buggy)
    - `trafficDistribution: PreferClose`: Routes to same zone if possible (needs cni-support)
    - Setup the stack one in each zone
- Measurements: Kubezonnet can detect cross-zone-traffic

## Disk latency

- Baseline 660MiB/s per SSD aka ~1 SSD per 5GBit/s Networking
- Example: 100Gbps needs a RAID0 with a bunch of SSDs

```mermaid
graph LR
    Querier-->|125ms|Cache
    Cache-->|200ms|S3
    direction TB
    Cache<-->SSD
```

## Memory managment

- Garbage Collection takes time and is a throughput for latency trade-off
- Idea: Avoid allocations
    - Preallocate (e.g. Arenas)
    - Allocation reuse (e.g. in grpc)
    - "Allocation Schemes" (thread per core)
- Avoid memory pressure by
    - Using gc-friendly types
    - Tuning your GC
- Idea: Implement your own optimized data structure

## Optimization in Kubernates

### Defaults

- Best efford
- No protection from consuming all node memory
- Critical services could get scheduled on the same node

### Requests and limits

- Requests: Needed to be scheduled
- Limits: Kill if exceeded
- Problem: Reactive, it just checks pods according to a cronjob (can be set as apiflag but has a minimum)
- Downward-API: You can reference the limits in your applications (to let the app trigger gc before the pod gets killed)

### Tains and tolerations

- Pin your workload basted on labels and annotations

### Static cpu manager

- Request a whole number of CPUs -> You get this core guranteed