Compare commits
No commits in common. "6931da118cbe1d8d2572fb39e5d95951c0384246" and "b61f31fdcf0c1bb9cd0bece0165cf6c8d5376dce" have entirely different histories.
6931da118c
...
b61f31fdcf
@ -1,80 +0,0 @@
|
|||||||
---
|
|
||||||
title: "From us to ms: Pushing Kubernetes Workloads to the Limit"
|
|
||||||
weight: 7
|
|
||||||
tags:
|
|
||||||
- rejekts
|
|
||||||
- performance
|
|
||||||
---
|
|
||||||
|
|
||||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
|
||||||
|
|
||||||
There were more details in the talk than I copied into these notes.
|
|
||||||
Most of them were just too much to write down or application specific.
|
|
||||||
|
|
||||||
## Why?
|
|
||||||
|
|
||||||
- We need it (Product requirements)
|
|
||||||
- Cost efficiency
|
|
||||||
|
|
||||||
## Cross Provider Networking
|
|
||||||
|
|
||||||
- Throughput:
|
|
||||||
- Same-Zone 200GB/s
|
|
||||||
- Cross-Zone 5-10% Pemnalty
|
|
||||||
- Latency:
|
|
||||||
- Same Zone P99: 0.95ms
|
|
||||||
- Cross zone P99: 1.95ms
|
|
||||||
- Result: Encourage Services to allways router in the same zone if possible
|
|
||||||
- How:
|
|
||||||
- Topology-Aware-Routing (older, a bit buggy)
|
|
||||||
- `trafficDistribution: PreferClose`: Routes to same zone if possible (needs cni-support)
|
|
||||||
- Setup the stack one in each zone
|
|
||||||
- Measurements: Kubezonnet can detect cross-zone-traffic
|
|
||||||
|
|
||||||
## Disk latency
|
|
||||||
|
|
||||||
- Baseline 660MiB/s per SSD aka ~1 SSD per 5GBit/s Networking
|
|
||||||
- Example: 100Gbps needs a RAID0 with a bunch of SSDs
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
graph LR
|
|
||||||
Querier-->|125ms|Cache
|
|
||||||
Cache-->|200ms|S3
|
|
||||||
direction TB
|
|
||||||
Cache<-->SSD
|
|
||||||
```
|
|
||||||
|
|
||||||
## Memory managment
|
|
||||||
|
|
||||||
- Garbage Collection takes time and is a throughput for latency trade-off
|
|
||||||
- Idea: Avoid allocations
|
|
||||||
- Preallocate (e.g. Arenas)
|
|
||||||
- Allocation reuse (e.g. in grpc)
|
|
||||||
- "Allocation Schemes" (thread per core)
|
|
||||||
- Avoid memory pressure by
|
|
||||||
- Using gc-friendly types
|
|
||||||
- Tuning your GC
|
|
||||||
- Idea: Implement your own optimized data structure
|
|
||||||
|
|
||||||
## Optimization in Kubernates
|
|
||||||
|
|
||||||
### Defaults
|
|
||||||
|
|
||||||
- Best efford
|
|
||||||
- No protection from consuming all node memory
|
|
||||||
- Critical services could get scheduled on the same node
|
|
||||||
|
|
||||||
### Requests and limits
|
|
||||||
|
|
||||||
- Requests: Needed to be scheduled
|
|
||||||
- Limits: Kill if exceeded
|
|
||||||
- Problem: Reactive, it just checks pods according to a cronjob (can be set as apiflag but has a minimum)
|
|
||||||
- Downward-API: You can reference the limits in your applications (to let the app trigger gc before the pod gets killed)
|
|
||||||
|
|
||||||
### Tains and tolerations
|
|
||||||
|
|
||||||
- Pin your workload basted on labels and annotations
|
|
||||||
|
|
||||||
### Static cpu manager
|
|
||||||
|
|
||||||
- Request a whole number of CPUs -> You get this core guranteed
|
|
@ -8,9 +8,6 @@ This is the first day of Cloud Native Rejekts and the first time of me attending
|
|||||||
|
|
||||||
## Talk recommendations
|
## Talk recommendations
|
||||||
|
|
||||||
> Ranked by should watch to could watch
|
|
||||||
|
|
||||||
- How to hire, manage and develop engineers: [Tech is broken and AI won't fix it](../05_broken-tech)
|
|
||||||
- What if my homelab is an african island: [Geographically Distributed Clusters: Resilient Distributed Compute on the Edge](../06_geo-distributed-clusters)
|
|
||||||
- Handling large number of clusters: [CRD Data Architecture for Multi-Cluster Kubernetes](../04_multicluster-crd)
|
|
||||||
- Handling large scale migrations: [The Cluster API Migration Retrospective: Live migrating hundreds of clusters to Cluster API](../02_clusterapi)
|
- Handling large scale migrations: [The Cluster API Migration Retrospective: Live migrating hundreds of clusters to Cluster API](../02_clusterapi)
|
||||||
|
- Handling lrage number of clusters: [CRD Data Architecture for Multi-Cluster Kubernetes](../04_multicluster-crd)
|
||||||
|
- How to hire, manage and develop engineers: [Tech is broken and AI won't fix it](../05_broken-tech)
|
@ -8,10 +8,4 @@ Yes that is a negative day.
|
|||||||
Why? Because the numbering of the days is based on KubeCon instead of the trip.
|
Why? Because the numbering of the days is based on KubeCon instead of the trip.
|
||||||
Why? Ask the sleep deprived version of me who started his trip to London at 2am...
|
Why? Ask the sleep deprived version of me who started his trip to London at 2am...
|
||||||
|
|
||||||
## What happened today?
|
## What h
|
||||||
|
|
||||||
I spent the first seven(-ish) hours of the day travelling from Herzo to London and checked into my hotel.
|
|
||||||
After I managed to grab my second breakfeast Tobi texted me that the demo env is down, so I started to prepare an alernative demo.
|
|
||||||
|
|
||||||
The afternoon was dedicated to optimizing the demo and getting a screen recording of the demo (harder than it sounds).
|
|
||||||
The day ended with a nice thai dinner (and iced thai lemon tea 🤤)
|
|
Loading…
x
Reference in New Issue
Block a user