Compare commits

..

65 Commits

Author SHA1 Message Date
b9060af72d docs(lessons): Added ratelimit blog(video
All checks were successful
Build latest image / build-container (push) Successful in 53s
2025-05-07 08:31:56 +02:00
3afb07e4c1 chore(day-1): Added missing tag
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-05-07 08:10:27 +02:00
4becb06ad3 fix: Wrong linebreak
All checks were successful
Build latest image / build-container (push) Successful in 57s
2025-05-07 07:09:57 +02:00
0e24bf4fd6 docs: Added youtube links
Some checks failed
Build latest image / build-container (push) Failing after 50s
2025-05-07 07:07:48 +02:00
f06c486182 fix: Pin hugo version
All checks were successful
Build latest image / build-container (push) Successful in 56s
2025-04-22 14:22:09 +02:00
f71971e793 docs: Slight rewording
Some checks failed
Build latest image / build-container (push) Failing after 48s
2025-04-22 13:57:52 +02:00
a7a3817a03 docs: Added datev at index
Some checks failed
Build latest image / build-container (push) Has been cancelled
2025-04-22 13:56:02 +02:00
47f7869257 docs(day2): Added own talk
All checks were successful
Build latest image / build-container (push) Successful in 51s
2025-04-08 10:22:40 +02:00
b2fd7a4c81 fix: Update diagram to correctly reflect Flux operations
All checks were successful
Build latest image / build-container (push) Successful in 51s
2025-04-07 18:57:12 +02:00
1213be7c30 docs: Added basic changelog 2025-04-07 18:56:18 +02:00
1f49a42edc fix(docs): Added missing tags
All checks were successful
Build latest image / build-container (push) Successful in 44s
2025-04-07 18:51:03 +02:00
c6f716ced1 fix(docs): Fixed relative links
All checks were successful
Build latest image / build-container (push) Successful in 47s
2025-04-07 18:50:21 +02:00
09ac5a9051 docs: Added images 2025-04-07 18:50:12 +02:00
5ed623d0ca docs: Added slide links for kubecon/cloudnativecon 2025-04-07 18:49:57 +02:00
f8ca21416b fix(day0): Typo in name
All checks were successful
Build latest image / build-container (push) Successful in 59s
2025-04-07 10:40:05 +02:00
dc4dd2d883 fix(day3): Typo
Some checks failed
Build latest image / build-container (push) Has been cancelled
2025-04-07 10:39:37 +02:00
957bc94344 docs(day3): etcd talk
Some checks failed
Build latest image / build-container (push) Failing after 36s
2025-04-04 15:08:02 +02:00
44a3653c84 docs(day3): feature flag talk
Some checks failed
Build latest image / build-container (push) Failing after 35s
2025-04-04 13:09:17 +02:00
6bf47e49c5 docs(day3): First talk of the day 🎉
Some checks failed
Build latest image / build-container (push) Failing after 34s
2025-04-04 12:25:46 +02:00
39d92acdb4 docs(day3): Added initial notes of the day 2025-04-04 12:02:37 +02:00
4d528bf5de docs(day2): Added single talk notes
All checks were successful
Build latest image / build-container (push) Successful in 46s
2025-04-03 18:49:22 +02:00
d2f3f5f95d docs(day2): Added daily notes
All checks were successful
Build latest image / build-container (push) Successful in 48s
2025-04-03 11:07:01 +02:00
6d0c95a8ac docs(day-1): Added notes for my talk
All checks were successful
Build latest image / build-container (push) Successful in 46s
2025-04-03 11:02:19 +02:00
3e4fbb616b docs(cnrj): Added video links
All checks were successful
Build latest image / build-container (push) Successful in 53s
2025-04-03 10:59:24 +02:00
d9605d602e docs(day1): Bloomberg call
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-04-02 18:23:49 +02:00
745e8f5896 style: Formatting
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-04-02 18:01:06 +02:00
78ca5973b8 docs: Updated day notes
Some checks failed
Build latest image / build-container (push) Failing after 34s
2025-04-02 17:43:43 +02:00
77f34ed1ab docs(day1): GPU Talk 2025-04-02 17:43:21 +02:00
a36f562cf4 docs(day1): Formatted notes 2025-04-02 17:15:55 +02:00
9ad9af0f9c docs(day1): Added operator q&a
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-04-02 16:00:12 +02:00
4f39c1102c docs(day1); Operator mistakes talk
All checks were successful
Build latest image / build-container (push) Successful in 45s
2025-04-02 15:55:03 +02:00
df93624814 docs(day1): Added notes to talk
All checks were successful
Build latest image / build-container (push) Successful in 48s
2025-04-02 13:27:38 +02:00
46b06c66fd docs: Added slides button to all pages
All checks were successful
Build latest image / build-container (push) Successful in 49s
2025-04-02 13:21:27 +02:00
b4d8aa29c3 feat(tempalte): Added slides button to template 2025-04-02 13:18:35 +02:00
4cec1917bf docs(day1): Added migration talk 2025-04-02 13:17:43 +02:00
bd7d9fe87d docs(day1): First talk
All checks were successful
Build latest image / build-container (push) Successful in 47s
2025-04-02 12:44:09 +02:00
f4858d81a8 docs(day0): Last talk
All checks were successful
Build latest image / build-container (push) Successful in 47s
2025-04-01 17:34:31 +02:00
bfcfe88cea docs(day0): Lego talk
All checks were successful
Build latest image / build-container (push) Successful in 45s
2025-04-01 17:16:55 +02:00
45a26383e0 docs(day0): Research talk 2025-04-01 16:52:43 +02:00
8dbdfd938f docs(day0): New talk
All checks were successful
Build latest image / build-container (push) Successful in 45s
2025-04-01 15:53:27 +02:00
8941108720 docs(day0): Dev envs
All checks were successful
Build latest image / build-container (push) Successful in 47s
2025-04-01 15:25:15 +02:00
f8512dc6ae docs(day0): Added abstractions
All checks were successful
Build latest image / build-container (push) Successful in 53s
2025-04-01 15:03:14 +02:00
c09bf8f637 docs(day0): Added talks 2025-04-01 15:03:03 +02:00
d90d5b8eab docs(day0): Added slides link
Some checks failed
Build latest image / build-container (push) Has been cancelled
2025-04-01 14:44:36 +02:00
8b78108a60 refactor(day-1): Button 2025-04-01 14:44:23 +02:00
d09e3ff3d1 docs(day0): Promotions talk
All checks were successful
Build latest image / build-container (push) Successful in 48s
2025-04-01 14:28:54 +02:00
8ddf87d2f4 docs(day0): Product thinking
All checks were successful
Build latest image / build-container (push) Successful in 43s
2025-04-01 12:39:08 +02:00
720d68803d docs(day0): past present future
All checks were successful
Build latest image / build-container (push) Successful in 48s
2025-04-01 12:06:58 +02:00
f0229abafd docs(day0): Added hireing talk 2025-04-01 11:24:14 +02:00
723051c498 docs(day0): More sponsored 2025-04-01 10:53:54 +02:00
7e6d0fc47f docs(day0): Sponsor keynotes
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-04-01 10:39:37 +02:00
fe8fa9693a docs(day0): First talk 2025-04-01 10:25:39 +02:00
8aab9217fe docs(day-1): Q&A
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-03-31 17:13:30 +02:00
936a4c8c3a docs(day-1): Added missing tags
All checks were successful
Build latest image / build-container (push) Successful in 45s
2025-03-31 17:09:18 +02:00
cc5325bf3f docs(day-1): Added multicluster pdb talk 2025-03-31 17:09:09 +02:00
30a976bb75 docs(day-1): Added edge talk
All checks were successful
Build latest image / build-container (push) Successful in 46s
2025-03-31 16:40:29 +02:00
88200c76df docs (day-1): DNS Talk
All checks were successful
Build latest image / build-container (push) Successful in 44s
2025-03-31 16:07:26 +02:00
e608712f31 docs(day-1): Service mesh talk
All checks were successful
Build latest image / build-container (push) Successful in 42s
2025-03-31 15:30:19 +02:00
ed77238254 docs(day-0): New talk
All checks were successful
Build latest image / build-container (push) Successful in 44s
2025-03-31 12:45:43 +02:00
80f62fd567 docs(day-1): First talk
All checks were successful
Build latest image / build-container (push) Successful in 47s
2025-03-31 10:58:04 +02:00
17b4407fea docs: new talk
All checks were successful
Build latest image / build-container (push) Successful in 42s
2025-03-30 18:16:00 +02:00
cb8d7f9d48 docs(day-2): Latest talk
All checks were successful
Build latest image / build-container (push) Successful in 43s
2025-03-30 16:36:26 +02:00
b3a8b29556 docs: Added how did we get there 2025-03-30 16:10:26 +02:00
52b967f78c fix(day-2): Typo
All checks were successful
Build latest image / build-container (push) Successful in 46s
2025-03-30 16:04:55 +02:00
c19d8a7f42 docs(day-2): New talk
All checks were successful
Build latest image / build-container (push) Successful in 46s
2025-03-30 16:00:29 +02:00
60 changed files with 2494 additions and 24 deletions

View File

@@ -1,4 +1,4 @@
FROM registry.odit.services/hub/hugomods/hugo:exts AS build
FROM registry.odit.services/hub/hugomods/hugo:exts-0.145.0 AS build
WORKDIR /app
COPY . /app/

View File

@@ -6,5 +6,6 @@ tags:
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
TODO:

View File

@@ -9,11 +9,38 @@ This current version is probably full of typos - will fix later. This is what ty
## How did I get there?
I attended KubeCon + CloudNativeCon Europe 2025 in London.
I attended Cloud Native Rejekts and KubeCon + CloudNativeCon Europe 2025 in London.
This year I was sent there by my employer [DATEV eG](https://datev.de) - thanks again to everyone who helped me with getting this trip approved (you know who you are).
Why? Because learning about all new things in the world of cloud is really important and war stories help to avoid mistakes that other's already made.
And [last year's experience](https://kubecon24.nicolai-ort.com) was really good, so I wanted to go again.
Plus I actually presented a talk at Cloud Native Rejekts 🥳.
## And how does this website get it's content
```mermaid
graph LR
Nicolai<-->|Watches|Talk
Nicolai-->|"Takes notes (and typos) + commits"|Repo
Repo-->|Triggers|Actions
Actions-->|Builds image and pushes to|Registry
Flux-->|Detects new image|Registry
Flux-->|Rolls out new image|Kubernetes
```
## Changelog™
- 2025-03-28: Inital repo and deployment setup
- 2025-03-30: First day of Cloud Native Rejekts
- 2025-03-31: Second day of Cloud Native Rejekts
- 2025-04-01: First day of KubeCon/CloudNativeCon
- 2025-04-02: Second day of KubeCon/CloudNativeCon
- 2025-04-03: Added video links for Cloud Native Rejekts
- 2025-04-03: Third day of KubeCon/CloudNativeCon
- 2025-04-04: Fourth day of KubeCon/CloudNativeCon
- 2025-04-07: Added missing images and slide links for KubeCon/CloudNativeCon
## Style Guide
The basic structure is as follows: `day/event-or-session`.

View File

@@ -0,0 +1,58 @@
---
title: What I wish i knew about container security
weight: 1
tags:
- rejekts
- security
---
{{% button href="https://www.youtube.com/watch?v=JAy6Ra0ulSw" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## BAseline
- Linux is like a hammer and containers look a lot like nails
- Containers aren't real: They are just processes with besser isolation
- IPTables is complicates
### Hard parts
- The kernel is shared we only predent to seperate processes through namespaces
- Filesystems: Containers bring a bunch of filesystems and sharing filesystems between multiple containers
- Softlinks are hard to do right because they point to a path and not the data itself
### How did we get here?
1. Unix with a buch of tools we still use
2. Linux (originally designed to for the desktop)
3. Kernel gets iptables
4. The rist concept of namespaces
5. More hypervisor stuff and official user namespaces
6. Containers (first lxc then docker)
## Sandboxing
- In browsers: They must protect the user from malicious content
- In containers: PRetty much the same - both run untrusted code that has to be isolated
## Namespaces
- Better isolation from other processes including resource constraints
- But: The shared kernel interacts with all processes (so kernel bugs can affect all namespaces)
![](../_imgs/namespaces.png)
## Improvements
- Secure Computing: Implement a secure state that we transition into before the process actually does stuff
- Paravirtualization: Instead of system calls to a shared kernel we make hyper-calls to the hypervisor
- Virtualization: The classic virtualization where everyone hosts their own kernel
## Stuff to look out for
> More or less a bit of advertisement
- Edera: Container native hypervisor without a shared kernel
- Styrolite: Rust-based container runtime sandbox
- eBPF and Tetragon for prevention and monitoring

View File

@@ -0,0 +1,30 @@
---
title: "The Hidden Brains of Kubernetes: Meet Controllers Powering the Cloud"
weight: 2
tags:
- rejekts
- operator
---
{{% button href="https://www.youtube.com/watch?v=PciVvE02L2w" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Big Picture
- Kubernetes is just a bunch of controllers
- We can add custom controllers
TODO: Steal Pod Controller sample
## Real World Power of controllers
- In Kubernetes: CCM, Scheduler, CM
- Operator = CRD + Controller
TODO: Steal images from slides
## Example
> Crossplane as the example of the basic reconcile idea
TODO: Steal images from slides

12
content/day-1/02_gslb.md Normal file
View File

@@ -0,0 +1,12 @@
---
title: Evaluating Global Load Balancing Options for Kubernetes in Practice
weight: 2
tags:
- rejekts
---
{{% button href="https://youtu.be/RBMRU8rtxfI" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://github.com/nicolaiort/rejekts2025-gslb" style="tip" icon="code" %}}Demo-Code and more{{% /button %}}
{{% button href="https://de.slideshare.net/slideshow/evaluating-global-load-balancing-options-for-kubernetes-in-practice-kubermatic-datev/277640385" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
My talk, notes will be released soon

View File

@@ -0,0 +1,112 @@
---
title: The service mesh wars - a new hope for kubernetes
weight: 3
tags:
- rejekts
---
{{% button href="https://www.youtube.com/watch?v=DdQzGsiounY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## The clans (popular solutions)
- Kuma
- Linkerd
- Cilium
- Istio
- Ambient Mesh
## The new hope: Gateway API
- Will integrate itself into the networking solution (nginx, istio, kong)
- CRDs for Ingress, LB, Servicemesh
- CRDs linke: Gateway, HttpRoute, GrpcRoute, TCPRoute
## Expectations
- Baseline: Control Plane and Data Plane (Application + Proxy)
- What we get: Rules, Logs, ...
- Proxy-Variants:
- Sidecar: Extra Pod, Service needs to be restarted for settings changes
- Sidecarless: One proxy per node
- Features: Ingress, egress, Mutual TLS, Retry Logic, Traffic Splitting, Ratelimits, Obervability
## Comparison
### Sidecar
TODO: Steal table from slides
| Kuma | Yes | Envoy
|Linkerd | Yes | Linkerd Proxy
### Features
TODO: Steal Diagrams from slides
- Kuma: Gateway API Supported
- CRD per Mesh with Ratelimiter, Timeouts, ....
- To add to meh: Annotation
- Linkerd: Gateway API Supported
- Core Component: Server
- To add to mesh: Annotate workload with proxy annotation
- Cilium: Gateway API mostly Support
- Utilizes eBPF for speed
- Cann deploy envoy
- CRDs for NEtworkPolicy
- Istio: Gateway API Supported
- CRDs with Services
- To add: Annotate namespace or workload
- Ambientmesh: Gateway API supported
- Same Config as istio
- Special: Layer 7 Rules require a waypoint
- Missing: Several Policy features
- To add: Annotate namespace and/or workload
TODO: Steal table from slides
### Observability
- Kuma: MEtrics by default with trace and log support (MeshTrace, MeshAccesslogs) via OpenTelemetry and it's own UI
- Linkerd: Prometheus metrics, Viz extension for UI and Jaeger extension for traces (not OTel compliant)
- Cilium: No Traces, only metrics and logs through hubble (with ui)
- Istio/Ambient: Metrics, Traces and Logs with full OTel support on Dataplane and a external UI (Kali)
TODO: Steal table
### Performance
> Tests: https://github.com/isItObservable/servicemeshsecuritybenchmark
- KPIs: Ressources and Resource usage
- Constant load, no policies:
- Kuma 5,59ms
- Linkerd: 2,55ms
- Cilium 0ms
- Istio: 6,43ms
- Ambientmesh: 3,59ms
- Loadtest no policies
- Kuma: 7ms
- Linkerd: 3,54ms
- Cilium: 0,57ms
- Istio: 8,8ms
- Ambientmesh: 3,54ms
- Constant load policies
- Kuma: 6,08
- Linkerd: 2,55
- Cilium: 0
- Istio: 9,19
- Ambientmesh: 3,69
- Loadtest: TODO
TODO: Steal overview slide
## Recommendation
- If ambientmesh supports everything you need: It performs the best
- Kuma includes everything you need when starting your first mesh
- Linkerd: Complex configuration
- Treat cilium as your cni and not nessecarely as your servicemesh
TODO: Steal conclusion slide

View File

@@ -0,0 +1,53 @@
---
title: Understanding and Debugging DNS in Kubernetes Clusters
weight: 4
tags:
- rejekts
---
{{% button href="https://www.youtube.com/watch?v=awXjABDknww" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://github.com/mqasimsarfraz/talks/tree/main/CloudNativeRejekts-2025" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Baseline
### DNS Components
```mermaid
graph LR
Application-->NodeLocalDNS-->CoreDNS-->Upstream
```
### Problems
- Many hidden systems
- Not easy to trace across clusters
## Tools
> Demo queries are located in the slides and were executed during the stream
### CoreDNS Log Plugin
- Core-Plugin (just needs to be activated)
- Logs all requests to stdout
### Hubble
- Cilium observability needs cilium l7 proxy, runs as deamonset
- Needs CiliumNetworkPolicies for AppPod and CoreDNS
- Metrics, UI and cli with jq (and protocol filter)
### Inspector Gadget
- Toolset for Kubernetes and Linux that can be customized
- Runns as daemonset or debug pod - gadgets are distributed as containers (via artifactorhub)
- DNS-Gadget: Trace via ebpf, post process with wasm
## Overview
- CoreDNS: Great for initial, nut only CoreDNS
- Hubble: Compact overview, but cilium needed with special configs
- Inspector Gadget: Rich DNS traces, limited tcp support

59
content/day-1/05_edge.md Normal file
View File

@@ -0,0 +1,59 @@
---
title: "Kubernetes at the Far Edge: Harnessing IoT with Lightweight Clusters and Akri"
weight: 5
tags:
- rejekts
- edge
---
{{% button href="https://www.youtube.com/watch?v=jywpFlOH3z0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## The far edge
- Resource constraint computing
- Limited connectivity
- More and smaller clusters
## Why kubernetes
- Automation, Scalability and resilience
- Workload Portability through containers
- Orchestration
- Declarative state
## Enter k0s
- Minimal footprint as static binary
- Simplified edge cluster management
## Managing disconnected edge nodes
- Needs: Remote managability
- Idea: Centralized, remote Control Plane (that only does control plane)
- Challenge: Network disconnections (kubernetes usually moves workload)
## Akri
> https://docs.akri.sh/
- Discovery of iot devices
- Exposes IoT devices as k8s resources
- Handels workload scheduling for leaf devices
![](../_imgs/akri-architecture.svg)
## Demo
Can be found in the video
## Q&A
- What about image distribution: Depends on networking conditions, k0s supports interna. images delivered as tar.gz
- What can the broker do: Anything that a pod can interact with
- What about reboots: Well akri had some problems in the demo, kubelet seems to start the containers again
## Random Notes
- Akri Kinda reminded me of the gpu-operator with extra resouce capacity for attached devices

View File

@@ -0,0 +1,59 @@
---
title: "Scaling PDBs: Introducing Multi-Cluster Resilience with x-pdb"
weight: 6
tags:
- rejekts
- multicluster
---
{{% button href="https://www.youtube.com/watch?v=w8rDxtrMGG8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Baseline Infra
- Multiple Clusters across cloud providers
- Cilium with Clustermesh
- Stretched CockroachDB and NATS
TODO: Steal overview from slides
## PDBs and limits
- PDB: Classic core component that requires a number of pods with successfull readyness probes per deployment
- Eviction: Can be stopped by a PDB what has not reached the minimum available
- Interruptions: Voluntary (New image, updated specs, ...) vs involuntary (Eviction, deletion, node pressule, NoExecute, Node deletion)
## Stateful across multiple clusters
- Baseline: PDBs only know about one cluster
- Problem: If the master pod fails (or get's evicted) on 2/3 clusters
- Factors: Movement, Maintainance, Chaos-Experiments, Secret rotation
- Workaround: Just manually check all systems before doing anything
- Idea: Multi-Cluster PDB
- Solution: A new hook on the eviciton api that interacts with a new Cluster-Aware CRD
## How it actually works
1. Drain API get's called
2. Check replicas accross cluster
3. Anwer based on current state
Actually: There is a lease-mechanism to prevent race conditions across clusters
TODO: Steal diagram from slides
## What works
- Voluntary: 100% supported
- Involuntary: Yes they hooked into most of the deletion api calls (eviction, pressure, kubectl delete, admissions, node deletion)
## Demo
Pretty interesting, watch the video to find out
## Q&A
- Do you need a flat network: No just expose the tcp lb
- Did you think about using etcd to implement the leases instead of objects: They use managed hostplanes and dont want another etcd
- Have you tried to commit upstream: Nope, pretty much not an option thanks to the managed control-plane not being able to set apropriate flags

View File

@@ -0,0 +1,406 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by Microsoft Visio, SVG Export akri-architecture.svg Page-7 -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ev="http://www.w3.org/2001/xml-events"
width="10.7038in" height="6.24177in" viewBox="0 0 770.672 449.407" xml:space="preserve" color-interpolation-filters="sRGB"
class="st32">
<style type="text/css">
<![CDATA[
.st1 {fill:#8ac4ff;stroke:#444a6d;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
.st2 {fill:#000000;font-family:Calibri;font-size:1.5em}
.st3 {fill:#ebedf2;stroke:#474b64;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
.st4 {fill:#000000;font-family:Calibri;font-size:1.33333em}
.st5 {fill:#524886;stroke:#474b64;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
.st6 {fill:#feffff;font-family:Calibri;font-size:1.5em}
.st7 {font-size:1em}
.st8 {fill:#0aaba9;stroke:#444a6d;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
.st9 {fill:#524886;stroke:#444a6d;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
.st10 {stroke:none;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
.st11 {fill:#ffffff;font-family:Calibri;font-size:1.49785em}
.st12 {fill:#444a6d;stroke:none;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
.st13 {stroke:#413a44;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.997017}
.st14 {stroke:#413a44;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.00451}
.st15 {fill:#0aaba9;stroke:#474b64;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
.st16 {fill:#41455d;stroke:none;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
.st17 {fill:#feffff;font-family:Calibri;font-size:1.00001em}
.st18 {fill:none;stroke:none;stroke-width:0.25}
.st19 {fill:#ebedf2;stroke:#41455d;stroke-width:1.5}
.st20 {fill:#444a6d;font-family:Calibri;font-size:1.00001em}
.st21 {fill:#2b74ef;font-size:1em}
.st22 {fill:#ebedf2;font-size:1em}
.st23 {fill:#524886;stroke:#444a6d;stroke-width:1.5}
.st24 {fill:#41455d}
.st25 {stroke:#474b64;stroke-width:0.25}
.st26 {fill:#41455d;stroke:#41455d;stroke-width:0.25}
.st27 {fill:#0aaba9;stroke:#444a6d;stroke-width:1.5}
.st28 {fill:#444a6d;font-family:Calibri;font-size:1.66667em}
.st29 {fill:#444a6d;font-family:Calibri;font-size:1.33333em}
.st30 {fill:#2b74ef;stroke:#444a6d;stroke-width:1.5}
.st31 {fill:#ffffff;font-family:Calibri;font-size:1.5em}
.st32 {fill:none;fill-rule:evenodd;font-size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3}
]]>
</style>
<g>
<title>Page-7</title>
<g id="shape1000-1" transform="translate(18.75,-37.0005)">
<title>Sheet.1000</title>
<desc>Edge Cluster</desc>
<path d="M0 63.68 L0 449.41 L484.04 449.41 L484.04 63.68 L0 63.68 L0 63.68 Z" class="st1"/>
<text x="4" y="83.88" class="st2">Edge Cluster</text> </g>
<g id="shape1001-4" transform="translate(33.0966,-184.359)">
<title>Sheet.1001</title>
<desc>Control Plane</desc>
<path d="M0 248.81 L0 449.41 L456.2 449.41 L456.2 248.81 L0 248.81 L0 248.81 Z" class="st3"/>
<text x="4" y="267.22" class="st4">Control Plane</text> </g>
<g id="shape1002-7" transform="translate(56.3874,-300.225)">
<title>Sheet.1002</title>
<desc>Kubernetes Scheduler</desc>
<path d="M0 401.68 C0 396.41 4.28 392.13 9.57 392.13 L118.54 392.13 C123.82 392.13 128.1 396.41 128.1 401.68 L128.1 439.87
C128.1 445.14 123.82 449.41 118.54 449.41 L9.57 449.41 C4.28 449.41 0 445.14 0 439.87 L0 401.68 Z"
class="st5"/>
<text x="22.08" y="415.37" class="st6">Kubernetes <tspan x="27.76" dy="1.2em" class="st7">Scheduler</tspan></text> </g>
<g id="shape1003-11" transform="translate(207.778,-300.225)">
<title>Sheet.1003</title>
<desc>Akri Controller</desc>
<path d="M0 401.69 C0 396.41 4.29 392.13 9.57 392.13 L118.54 392.13 C123.82 392.13 128.1 396.41 128.1 401.69 L128.1 439.87
C128.1 445.14 123.82 449.41 118.54 449.41 L9.57 449.41 C4.29 449.41 0 445.14 0 439.87 L0 401.69 Z"
class="st8"/>
<text x="49.55" y="415.37" class="st6">Akri <tspan x="27.13" dy="1.2em" class="st7">Controller</tspan></text> </g>
<g id="shape1004-15" transform="translate(52.7937,-213.237)">
<title>Sheet.1004</title>
<path d="M0 401.69 C-0 396.41 4.29 392.13 9.57 392.13 L269.93 392.13 C275.21 392.13 279.49 396.41 279.49 401.69 L279.49
439.87 C279.49 445.14 275.21 449.41 269.93 449.41 L9.57 449.41 C4.29 449.41 0 445.14 0 439.87 L0 401.69
Z" class="st9"/>
</g>
<g id="shape1005-17" transform="translate(153.653,-228.513)">
<title>Sheet.1005</title>
<desc>API Server</desc>
<path d="M91.01 427.83 L0 427.83 L0 449.41 L91.01 449.41 L91.01 427.83" class="st10"/>
<text x="7.97" y="444.01" class="st11">API Server</text> </g>
<g id="shape1006-21" transform="translate(111.013,-270.515)">
<title>Sheet.1006</title>
<path d="M0 430.53 L9.42 421.13 L18.85 430.53 L14.14 430.53 L14.14 440 L18.85 440 L9.42 449.41 L0 440 L4.71 440 L4.71
430.53 L0 430.53 Z" class="st12"/>
</g>
<g id="shape1007-23" transform="translate(111.013,-270.515)">
<title>Sheet.1007</title>
<path d="M0 430.53 L9.42 421.13 L18.85 430.53 L14.14 430.53 L14.14 440 L18.85 440 L9.42 449.41 L0 440 L4.71 440 L4.71
430.53 L0 430.53" class="st13"/>
</g>
<g id="shape1008-26" transform="translate(262.403,-270.515)">
<title>Sheet.1008</title>
<path d="M0 430.53 L9.42 421.13 L18.85 430.53 L14.14 430.53 L14.14 440 L18.85 440 L9.42 449.41 L0 440 L4.71 440 L4.71
430.53 L0 430.53 Z" class="st12"/>
</g>
<g id="shape1009-28" transform="translate(262.403,-270.515)">
<title>Sheet.1009</title>
<path d="M0 430.53 L9.42 421.13 L18.85 430.53 L14.14 430.53 L14.14 440 L18.85 440 L9.42 449.41 L0 440 L4.71 440 L4.71
430.53 L0 430.53" class="st14"/>
</g>
<g id="shape1010-31" transform="translate(33.0966,-51.7091)">
<title>Sheet.1010</title>
<desc>Node</desc>
<path d="M0 344.8 L0 449.41 L456.2 449.41 L456.2 344.8 L0 344.8 L0 344.8 Z" class="st3"/>
<text x="4" y="363.2" class="st4">Node</text> </g>
<g id="shape1011-34" transform="translate(56.3874,-70.5221)">
<title>Sheet.1011</title>
<path d="M0 401.69 C-0 396.41 4.29 392.13 9.57 392.13 L118.54 392.13 C123.82 392.13 128.1 396.41 128.1 401.69 L128.1
439.87 C128.1 445.14 123.82 449.41 118.54 449.41 L9.57 449.41 C4.29 449.41 0 445.14 0 439.87 L0 401.69 Z"
class="st5"/>
</g>
<g id="shape1012-36" transform="translate(82.4207,-86.7911)">
<title>Sheet.1012</title>
<desc>Kubelet</desc>
<path d="M69.37 427.83 L0 427.83 L0 449.41 L69.37 449.41 L69.37 427.83" class="st10"/>
<text x="6.56" y="444.01" class="st11">Kubelet</text> </g>
<g id="shape1013-40" transform="translate(198,-70.5221)">
<title>Sheet.1013</title>
<desc>Akri Agent</desc>
<path d="M0 401.69 C0 396.41 2.23 392.13 4.98 392.13 L61.64 392.13 C64.38 392.13 66.61 396.41 66.61 401.69 L66.61 439.87
C66.61 445.14 64.38 449.41 61.64 449.41 L4.98 449.41 C2.23 449.41 0 445.14 0 439.87 L0 401.69 Z"
class="st15"/>
<text x="18.82" y="415.38" class="st11">Akri <tspan x="11.67" dy="1.2em" class="st7">Agent</tspan></text> </g>
<g id="shape1016-44" transform="translate(466.794,-94.2275)">
<title>Sheet.1016</title>
<desc>&#60;protocol&#62;</desc>
<path d="M0 429.16 L23.82 408.91 L23.82 419.03 L71.46 419.03 L71.46 408.91 L95.28 429.16 L71.46 449.41 L71.46 439.28
L23.82 439.28 L23.82 449.41 L0 429.16 Z" class="st16"/>
<text x="21" y="432.76" class="st17">&#60;protocol&#62;</text> </g>
<g id="shape1017-47" transform="translate(111.013,-128.878)">
<title>Sheet.1017</title>
<path d="M0 374.99 L9 366.01 L18.01 374.99 L13.51 374.99 L13.51 440.42 L18.01 440.42 L9 449.41 L0 440.42 L4.5 440.42
L4.5 374.99 L0 374.99 Z" class="st12"/>
</g>
<g id="shape1018-49" transform="translate(111.013,-128.878)">
<title>Sheet.1018</title>
<path d="M0 374.99 L9 366.01 L18.01 374.99 L13.51 374.99 L13.51 440.42 L18.01 440.42 L9 449.41 L0 440.42 L4.5 440.42
L4.5 374.99 L0 374.99" class="st14"/>
</g>
<g id="shape1019-52" transform="translate(220.436,-129.118)">
<title>Sheet.1019</title>
<path d="M0 375.05 L9.06 366.01 L18.13 375.05 L13.6 375.05 L13.6 440.36 L18.13 440.36 L9.06 449.41 L0 440.36 L4.53 440.36
L4.53 375.05 L0 375.05 Z" class="st16"/>
</g>
<g id="shape1-54" transform="translate(579.294,-344.358)">
<title>Sheet.1</title>
<rect x="0" y="382.668" width="63" height="66.7397" class="st18"/>
<!-- Unsupported Record: EmfPlusRecordTypeSetPixelOffsetMode -->
<svg viewBox="-0.55922 -0.55862 68.039 72" height="66.7397" preserveAspectRatio="none" width="63" x="0" y="382.668">
<clipPath id="mfid1">
<rect x="-0.55922" y="-0.55862" width="68.039" height="72" id="mfid2"/>
</clipPath>
<g clip-path="url(#mfid1)">
<mask id="mfid3">
<rect width="68" height="72" fill="white" stroke="none"/>
</mask>
<mask id="mfid4" fill="white" stroke="none">
<g>
<g mask="url(#mfid3)">
<use xlink:href="#mfid2"/>
</g>
</g>
</mask>
<defs>
<image id="mfid5" width="68" height="72" xlink:href=""/>
</defs>
<!-- Unsupported Record: EmfPlusRecordTypeSetObject Obj_ImageAttributes -->
<g mask="url(#mfid4)">
<g transform="matrix(0.00015748, 0, 0, 0.00015748, 0, 0)">
<clipPath id="mfid6">
<rect x="-0.5" y="-0.5" width="68" height="72"/>
</clipPath>
<use xlink:href="#mfid5" clip-path="url(#mfid6)" transform="matrix(6350, 0, 0, 6350, 3175, 3175)"/>
</g>
</g>
</g>
</svg>
<rect x="0" y="382.668" width="63" height="66.7397" class="st18"/>
</g>
<g id="shape1020-57" transform="translate(518.023,-187.512)">
<title>Sheet.1020</title>
<desc>kind: Configuration metadata: ..name: akri-&#60;protocol&#62; spec: ....</desc>
<rect x="0" y="302.977" width="203.541" height="146.43" class="st19"/>
<text x="4" y="314.99" class="st20">kind: <tspan class="st21">Configuration </tspan><tspan x="4" dy="1.2em" class="st7">metadata: </tspan><tspan
x="4" dy="1.2em" class="st22">..</tspan>name: akri<tspan class="st21">-</tspan><tspan class="st21">&#60;protocol&#62; </tspan><tspan
x="4" dy="1.2em" class="st7">spec: </tspan><tspan x="4" dy="1.2em" class="st22">..</tspan>discoveryHandler: <tspan
x="4" dy="1.2em" class="st22">…..</tspan>name: &#60;protocol&#62; <tspan x="4" dy="1.2em" class="st22">..</tspan>brokerPodSpec: <tspan
x="4" dy="1.2em" class="st22">…..</tspan>containers: <tspan x="4" dy="1.2em" class="st22">…..</tspan>- name: <tspan
class="st21">custom</tspan><tspan class="st21">-</tspan><tspan class="st21">broker </tspan><tspan x="4"
dy="1.2em" class="st22">……..</tspan>image: &#34;<tspan class="st21">ghcr.io/</tspan><tspan class="st21">&#34;</tspan></text> </g>
<g id="shape1021-77" transform="translate(782.153,197.043) rotate(90)">
<title>Sheet.1021</title>
<path d="M0 426.65 L9.42 415.31 L18.85 426.65 L14.14 426.65 L14.14 438.07 L18.85 438.07 L9.42 449.41 L0 438.07 L4.71
438.07 L4.71 426.65 L0 426.65 Z" class="st16"/>
</g>
<g id="group1022-79" transform="translate(367.794,-193.864)">
<title>Can.1091</title>
<desc>etcd</desc>
<g id="shape1023-80">
<title>Sheet.1023</title>
<path d="M0 435.91 A53.0646 13.5 -180 0 0 106.13 435.91 L106.13 282.91 L0 282.91 L0 435.91 Z" class="st23"/>
</g>
<g id="shape1022-82">
<ellipse cx="53.0646" cy="282.907" rx="53.0646" ry="13.5" class="st23"/>
<text x="37.04" y="289.61" class="st6">etcd</text> </g>
</g>
<g id="group1024-85" transform="translate(517.374,600.4) rotate(180)">
<title>1-D single.1004</title>
<g id="shape1025-86">
<title>Sheet.1025</title>
<path d="M-0.75 444.06 L48.8 444.06 L48.8 449.41 L54.14 438.72 L48.8 428.03 L48.8 433.38 L-0.75 433.38 L-0.75 444.06
Z" class="st24"/>
<path d="M-0.75 444.06 L48.8 444.06 L48.8 449.41 L54.14 438.72 L48.8 428.03 L48.8 433.38 L-0.75 433.38"
class="st25"/>
</g>
<g id="shape1026-89">
<title>Sheet.1026</title>
<path d="M0 444.06 L48.8 444.06 L48.8 449.41 L54.14 438.72 L48.8 428.03 L48.8 433.38 L0 433.38" class="st25"/>
</g>
<g id="shape1027-92" transform="translate(-0.5,-5.59375)">
<title>Sheet.1027</title>
<rect x="0" y="439.22" width="0.5" height="10.1875" class="st26"/>
</g>
</g>
<g id="group1031-94" transform="translate(382.5,-207)">
<title>Sheet.1031</title>
<g id="shape1032-95" transform="translate(0.306904,-68.056)">
<title>Rectangle.1066</title>
<desc>Configuration CRD</desc>
<rect x="0" y="385.571" width="80.6931" height="63.8367" class="st27"/>
<text x="6.98" y="400.37" class="st17">Configuration <tspan x="30.2" dy="1.2em" class="st7">CRD</tspan></text> </g>
<g id="shape1033-99">
<title>Rectangle.1067</title>
<desc>Instance CRD</desc>
<rect x="0" y="385.571" width="80.6931" height="63.8367" class="st27"/>
<text x="19.78" y="400.37" class="st17">Instance <tspan x="30.2" dy="1.2em" class="st7">CRD</tspan></text> </g>
<g id="shape1034-103" transform="translate(3.02536,-70.6819)">
<title>Rectangle.1068</title>
<desc>&#60;protocol&#62; Configuration</desc>
<rect x="0" y="421.128" width="75.2562" height="28.2795" class="st27"/>
<text x="10.99" y="431.67" class="st17">&#60;protocol&#62; <tspan x="4.26" dy="1.2em" class="st7">Configuration</tspan></text> </g>
<g id="shape1035-107" transform="translate(2.71845,-3.4226)">
<title>Rectangle.1069</title>
<desc>&#60;protocol&#62; Instance</desc>
<rect x="0" y="421.128" width="75.2562" height="28.2795" class="st27"/>
<text x="10.99" y="431.67" class="st17">&#60;protocol&#62; <tspan x="17.06" dy="1.2em" class="st7">Instance</tspan></text> </g>
</g>
<g id="shape1036-111" transform="translate(582.879,-77.8757)">
<title>Sheet.1036</title>
<desc>Leaf Device</desc>
<path d="M0 362.79 L0 449.41 L87.88 449.41 L87.88 362.79 L0 362.79 L0 362.79 Z" class="st3"/>
<text x="26.92" y="400.1" class="st28">Leaf <tspan x="16.8" dy="1.2em" class="st7">Device</tspan></text> </g>
<g id="shape1037-115" transform="translate(574.939,-67.186)">
<title>Sheet.1037</title>
<desc>Leaf Device</desc>
<path d="M0 362.79 L0 449.41 L87.88 449.41 L87.88 362.79 L0 362.79 L0 362.79 Z" class="st3"/>
<text x="26.92" y="400.1" class="st28">Leaf <tspan x="16.8" dy="1.2em" class="st7">Device</tspan></text> </g>
<g id="shape1038-119" transform="translate(567,-58.186)">
<title>Sheet.1038</title>
<desc>Leaf Device</desc>
<path d="M0 362.79 L0 449.41 L87.88 449.41 L87.88 362.79 L0 362.79 L0 362.79 Z" class="st3"/>
<text x="6.8" y="440.61" class="st29">Leaf Device</text> </g>
<g id="shape2-122" transform="translate(569.909,-102.2)">
<title>Sheet.2</title>
<rect x="0" y="406.801" width="43.9393" height="42.6065" class="st18"/>
<!-- Unsupported Record: EmfPlusRecordTypeSetPixelOffsetMode -->
<svg viewBox="-0.55922 -0.55862 129.06 125.01" height="42.6065" preserveAspectRatio="none" width="43.9393" x="0"
y="406.801">
<clipPath id="mfid7">
<rect x="-0.55922" y="-0.55862" width="129.06" height="125.01" id="mfid8"/>
</clipPath>
<g clip-path="url(#mfid7)">
<mask id="mfid9">
<rect width="129" height="125" fill="white" stroke="none"/>
</mask>
<mask id="mfid10" fill="white" stroke="none">
<g>
<g mask="url(#mfid9)">
<use xlink:href="#mfid8"/>
</g>
</g>
</mask>
<defs>
<image id="mfid11" width="129" height="125" xlink:href=""/>
</defs>
<!-- Unsupported Record: EmfPlusRecordTypeSetObject Obj_ImageAttributes -->
<g mask="url(#mfid10)">
<g transform="matrix(0.00015748, 0, 0, 0.00015748, 0, 0)">
<clipPath id="mfid12">
<rect x="-0.5" y="-0.5" width="129" height="125"/>
</clipPath>
<use xlink:href="#mfid11" clip-path="url(#mfid12)" transform="matrix(6350, 0, 0, 6350, 3175, 3175)"/>
</g>
</g>
</g>
</svg>
<rect x="0" y="406.801" width="43.9393" height="42.6065" class="st18"/>
</g>
<g id="shape3-125" transform="translate(616.283,-104.121)">
<title>Sheet.3</title>
<rect x="0" y="415.276" width="34.0958" height="34.1311" class="st18"/>
<!-- Unsupported Record: EmfPlusRecordTypeSetPixelOffsetMode -->
<svg viewBox="-0.55922 -0.55862 120.05 120.04" height="34.1311" preserveAspectRatio="none" width="34.0958" x="0"
y="415.276">
<clipPath id="mfid13">
<rect x="-0.55922" y="-0.55862" width="120.05" height="120.04" id="mfid14"/>
</clipPath>
<g clip-path="url(#mfid13)">
<mask id="mfid15">
<rect width="120" height="120" fill="white" stroke="none"/>
</mask>
<mask id="mfid16" fill="white" stroke="none">
<g>
<g mask="url(#mfid15)">
<use xlink:href="#mfid14"/>
</g>
</g>
</mask>
<defs>
<image id="mfid17" width="120" height="120" xlink:href=""/>
</defs>
<!-- Unsupported Record: EmfPlusRecordTypeSetObject Obj_ImageAttributes -->
<g mask="url(#mfid16)">
<g transform="matrix(0.00015748, 0, 0, 0.00015748, 0, 0)">
<clipPath id="mfid18">
<rect x="-0.5" y="-0.5" width="120" height="120"/>
</clipPath>
<use xlink:href="#mfid17" clip-path="url(#mfid18)" transform="matrix(6350, 0, 0, 6350, 3175, 3175)"/>
</g>
</g>
</g>
</svg>
<rect x="0" y="415.276" width="34.0958" height="34.1311" class="st18"/>
</g>
<g id="shape4-128" transform="translate(593.891,-79.7301)">
<title>Sheet.4</title>
<rect x="0" y="415.274" width="34.0958" height="34.1336" class="st18"/>
<!-- Unsupported Record: EmfPlusRecordTypeSetPixelOffsetMode -->
<svg viewBox="-0.55922 -0.55862 112.03 112.03" height="34.1336" preserveAspectRatio="none" width="34.0958" x="0"
y="415.274">
<clipPath id="mfid19">
<rect x="-0.55922" y="-0.55862" width="112.03" height="112.03" id="mfid20"/>
</clipPath>
<g clip-path="url(#mfid19)">
<mask id="mfid21">
<rect width="112" height="112" fill="white" stroke="none"/>
</mask>
<mask id="mfid22" fill="white" stroke="none">
<g>
<g mask="url(#mfid21)">
<use xlink:href="#mfid20"/>
</g>
</g>
</mask>
<defs>
<image id="mfid23" width="112" height="112" xlink:href=""/>
</defs>
<!-- Unsupported Record: EmfPlusRecordTypeSetObject Obj_ImageAttributes -->
<g mask="url(#mfid22)">
<g transform="matrix(0.00015748, 0, 0, 0.00015748, 0, 0)">
<clipPath id="mfid24">
<rect x="-0.5" y="-0.5" width="112" height="112"/>
</clipPath>
<use xlink:href="#mfid23" clip-path="url(#mfid24)" transform="matrix(6350, 0, 0, 6350, 3175, 3175)"/>
</g>
</g>
</g>
</svg>
<rect x="0" y="415.274" width="34.0958" height="34.1336" class="st18"/>
</g>
<g id="group1039-131" transform="translate(366.448,-68.8351)">
<title>Sheet.1039</title>
<g id="shape1028-132" transform="translate(11.7657,-10.9193)">
<title>Wavy Box.1020</title>
<desc>Broker</desc>
<path d="M83.04 436.74 L83.04 394.16 L0 394.16 L0 445.26 C31.94 453.78 41.15 447.66 51.76 442.93 C59.57 439.46 68.13
436.74 83.04 436.74 Z" class="st30"/>
<text x="17.03" y="427.18" class="st31">Broker</text> </g>
<g id="shape1029-135" transform="translate(6.41885,-5.69989)">
<title>Wavy Box.1019</title>
<desc>Broker</desc>
<path d="M83.04 436.74 L83.04 394.16 L0 394.16 L0 445.26 C31.94 453.78 41.15 447.66 51.76 442.93 C59.57 439.46 68.13
436.74 83.04 436.74 Z" class="st30"/>
<text x="17.03" y="427.18" class="st31">Broker</text> </g>
<g id="shape1030-138">
<title>Wavy Box.1003</title>
<desc>custom-broker</desc>
<path d="M83.04 436.74 L83.04 394.16 L0 394.16 L0 445.26 C31.94 453.78 41.15 447.66 51.76 442.93 C59.57 439.46 68.13
436.74 83.04 436.74 Z" class="st30"/>
<text x="11.76" y="414.36" class="st31">custom-<tspan x="17.2" dy="1.2em" class="st7">broker</tspan></text> </g>
</g>
<g id="shape1040-142" transform="translate(288,-68.8351)">
<title>Sheet.1040</title>
<desc>&#60;protocol&#62; Discovery Handler</desc>
<path d="M0 401.69 C0 396.41 2.23 392.13 4.98 392.13 L61.64 392.13 C64.38 392.13 66.61 396.41 66.61 401.69 L66.61 439.87
C66.61 445.14 64.38 449.41 61.64 449.41 L4.98 449.41 C2.23 449.41 0 445.14 0 439.87 L0 401.69 Z"
class="st15"/>
<text x="6.67" y="409.97" class="st17">&#60;protocol&#62; <tspan x="9.68" dy="1.2em" class="st7">Discovery </tspan><tspan
x="13.93" dy="1.2em" class="st7">Handler</tspan></text> </g>
<g id="shape1041-147" transform="translate(715.807,342.509) rotate(90)">
<title>Sheet.1041</title>
<path d="M0 436.83 L9.42 430.56 L18.85 436.83 L14.14 436.83 L14.14 443.14 L18.85 443.14 L9.42 449.41 L0 443.14 L4.71
443.14 L4.71 436.83 L0 436.83 Z" class="st12"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 181 KiB

View File

@@ -4,8 +4,19 @@ title: Day -1
weight: 3
---
TODO:
The second and last day of cloud native rejekts and (some might say most importantly) time for my talk.
This was another very interesting day and I can only recommend attending cloud native rejekts (and will always try to atend in the future if possible).
## Talk recommendations
* TODO:
- My Talk: [Evaluating Global Load Balancing Options for Kubernetes in Practice](./02_gslb)
- Service Mesh Intro + Comparison: [The service mesh wars - a new hope for kubernetes](./03_service-mesh)
- How to handle evection and statefulness across clusters: [Scaling PDBs: Introducing Multi-Cluster Resilience with x-pdb](./06_scaling-pdbs)
- Intro to operators: [The Hidden Brains of Kubernetes: Meet Controllers Powering the Cloud](./02_controllers)
## Other stuff I learned or people i talk to
- Take a deeper look into CoreDNS plugins
- A bunch of nice people that heard my talk and had questions
- Someone from Ampere that would like to help me to convince the infra team to get arm nodes
- Look into NATS (at least a bit), everyone seems to like it but i never used it myself (only in some projects)

View File

@@ -7,5 +7,6 @@ tags:
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Short opening keynote thanking volunteers and attendees.

View File

@@ -4,10 +4,12 @@ weight: 2
tags:
- rejekts
- cluster
- operatr
- operator
- multicluster
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=r0W6cCJAGro" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
The talk started with a base introduction of ClusterAPI and the operations at gigantswarm.

View File

@@ -6,7 +6,8 @@ tags:
- keynote
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=m9NRk-6MSvY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
A short keynote from micrososft about their contributions to open source and used tools:
- infra (kubernates, istio, hyperlight)

View File

@@ -3,9 +3,11 @@ title: CRD Data Architecture for Multi-Cluster Kubernetes
weight: 4
tags:
- rejekts
- multicluster
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=e1BmT0jc_Fs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Background

View File

@@ -5,7 +5,8 @@ tags:
- rejekts
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=CAPtQnH4rPY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Recruitment & Staffing

View File

@@ -5,7 +5,8 @@ tags:
- rejekts
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=qNShvqSTKCU" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Background: The state of cloud in mauritius

View File

@@ -6,7 +6,8 @@ tags:
- performance
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=EYipC5y-8rM" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
There were more details in the talk than I copied into these notes.
Most of them were just too much to write down or application specific.

View File

@@ -0,0 +1,110 @@
---
title: Building air-gapped control planes for a global pharma leader using crossplane and argo
weight: 8
tags:
- rejekts
- crossplane
---
{{% button href="https://www.youtube.com/watch?v=D4bKe4rAasc" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Joint effort of novo-nordik and upbound.
## Background
- Ymir Platform: Foundational abstraction platform
- Goal: Faster time to market
- Usage in pharma: end-2-end compliance
- Airgap: Use gitopt and prevent human interaction with the control planes
## Setup
- Decision for crossplane was obvious
- Problem: Chicken and egg "we provision clusters via crossplane but crossplane needs a cluster"
- GitOps: Everything as code with automatic tests and argo
- Infra: Azure
### Public AKS
```mermaid
graph LR
subgraph MC
ProviderAzure
ProviderKubernetes
end
ProviderAzure-->|Calls APU|AKS
AKS-->|Provisions|Kubernetescluster
ProviderKubernetes-->|Deploys service on|Kubernetescluster
```
### Bastion Bootstrap
- Options: Terraform/Opentofu
- Goal: Crossplane all the things
- Solution: Run Crossplane in a github action
1. Kind Cluster
2. Install Crossplane
3. Propagete Credentials
4. Create Cluster
- Tooling: Uptest - E2E Test automation Framework, can be used for bootstrapping since it creates kind cluster with crossplane
```mermaid
graph LR
subgraph GitHubRunner
Kubernetes
Crossplane
end
subgraph Azure
BastionVM
end
Crossplane-->|Create|BastionVM
```
### Next steps
- Problem: How to access bastion
- Solution: Auto-register bastion as github runner
- Create Bastion-Cluster via Uptest
```mermaid
graph LR
subgraph Azure
subgraph BastionVM
GitHubRunner
Kubernetes
Crossplane
end
subgraph BastionCluster
Kubernetes
Argo
CrossPlane
end
end
Crossplane-->|Create|BastionCluster
```
TODO: Steal image from slides
## Challenges
- Argo sync waves:
- Problem: Argo does not support eventual consistency
- Example: Install a ProviderConfig before your Provider and sync fails without retry
- Order stuff very carefully
- Delivering updates to private clusters
- Difference between public and private: It's the same package
- Upgrades/Downgrades: Change the package (Crossplane) and cluster (CRD)
- Testing:
- Static: Multiple stages and each stage has it's own bootstrap env that can be set to any branch
- Ephemeral: Uptest
TODO: Steal images from slides
## Wrap-up
- Cloud native air-gapped ✅
- GitOps ✅
- Crossplane, no terraform ✅
- Extensible, reusable, API-first ✅

View File

@@ -0,0 +1,84 @@
---
title: End to End Message Authenticity in Cloud Native Systems
weight: 9
tags:
- rejekts
- security
---
{{% button href="https://www.youtube.com/watch?v=rJacyDygVi0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Why does e2e authenticity matter?
- Classic Setup: Micro-Services with TLS and auth via Bearer
```mermaid
graph LR
User-->|TLS|Gateway
Gateway-->|mTLS|Server
Server-->|mTLS|Gateway
Gateway-->|TLS|User
```
- Intrusion: Hacked Gateway
- Can modify the request
- Could log auth tokens
- Could replay requests with different body or token
## Baseline OIDC
- Only IDP has private key for signing
- Anyone can fetch the private key and verify
- Usage: SSO, Trust Federation
- Problem: Symmetric Credential can be forwarded if leaked
## Fixes
### HTTP Message Signatures
- Idea:
- Client can sign the content and headers with a symmstric/asynmetric key
- Server can verify the signature
- Implementation: Basicly just an additional Signature Header and a Header that tells us what is included in the signature
```
HTTPS POST /test
Authorization: Bearer <token>
Signature-Input: "authorization" @body
Signature: ahsz7d9zahbsdoih
```
- Problem: Key distribution
- Real-World: AWS v4 Signature shares accesskey and secretkey out of band and signs header with accesskey (symmatric)
- Transitive Trust
### OIDC Key binding
TODO: Steal image from slides
### Proof of Posession
> Basicly adds a nonce that we have to sign and the idp now knows that we really posess it
TODO: Steal image from Slides
### OpenPubKey
> Assigns meaning to the nonce and can reconstruct the nonce for a reverse check
## Demo
The demo uses GitHub as a PKI (since all public keys get exposed via github).
Pretty cool: They automated the demo via a go cli.
TODO: Link to demo code
TODO: Steal image from Slides
## Next steps
- SPIFFE is the de-facto standard for distributing identities to workloads
1. Workloads asks "Who am I"
2. Agent attests the workload
3. Agent provides OIDC or X.509 to Workloads
* WIMSE RFC: Basicly DPoP/OpenPub
1. Workload get's a private key
2. Issuer binds workload identity to the public key
3. Auth trusts SPIFFE, it can trust the key

View File

@@ -0,0 +1,73 @@
---
title: "The auto-scaling part: VPA, HPQ, KEDA, Nodes, How do they dance"
weight: 10
tags:
- rejekts
---
{{% button href="https://www.youtube.com/watch?v=1US_-3udMDo" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Hypothesis
- In 2024 27% of cloud spent was wasted
- 100ms delay => decrease in sales
## Pod resources
- Requests: Informs scheduler's decision
- Too low: Schedule on strained nodes
- Too high: Wasted resources
- Limits: Throttels (CPU) or Kills (Memory) if reached
- QoS: sort the eviction priority during ressource pressure
- Quranteed (request=limits)
- Burstable (Limits>Requests)
- Best effort (Nothing defined)
- Gotcha: CPU throtteling can happen before tirggers happen if requests and limits are very close
TODO: Steal table from Slides
Requests | 100m, 256Mi | 100m, 256Mi
Limits |100m, 256Mi | None or <limits
QoS | Gurantee | Burstable | Best effort
## Scalers
- VPA: Moar power aka reccomend requests
- HPA: Moar moar aka more replicas
- KEDA: Proxy over HPA
### VPA
Modes:
- Off: Dry-Run
- Initial: Applies Reccomendations to new Pods (can be used for finding out)
- Auto/Recreate: Evicts and restarts pods to update resources
Trigger: Usually Memory
Tip: `maxAllowed` in order to not exhaust stuff
### HPA
- Trigger: Usually cpu (percent of requests)
- Formula: $1+\frac{usage}{target}$
- Fun fact: Can not scale to 0
### KeDA
- Basicly automates HPA with flexible metrics (from different soruces)
- Can scale Jobs
- Can Scale to 0
## Anti patterns
TODO: Steal from slides
| Pattern | Bad | Better
| CPI limit = Requests | Throtteling before scale | Set requests only |
## Demo
Auto scaling meme generator (see slides/video)

View File

@@ -10,7 +10,12 @@ This is the first day of Cloud Native Rejekts and the first time of me attending
> Ranked by should watch to could watch
- How to hire, manage and develop engineers: [Tech is broken and AI won't fix it](../05_broken-tech)
- What if my homelab is an african island: [Geographically Distributed Clusters: Resilient Distributed Compute on the Edge](../06_geo-distributed-clusters)
- Handling large number of clusters: [CRD Data Architecture for Multi-Cluster Kubernetes](../04_multicluster-crd)
- Handling large scale migrations: [The Cluster API Migration Retrospective: Live migrating hundreds of clusters to Cluster API](../02_clusterapi)
- How to hire, manage and develop engineers: [Tech is broken and AI won't fix it](./05_broken-tech)
- What if my homelab is an african island: [Geographically Distributed Clusters: Resilient Distributed Compute on the Edge](./06_geo-distributed-clusters)
- Bootstrap and CI/CD with crossplane: [Building air-gapped control planes for a global pharma leader using crossplane and argo](./08_airgapped-cp)
- Handling large number of clusters: [CRD Data Architecture for Multi-Cluster Kubernetes](./04_multicluster-crd)
- Handling large scale migrations: [The Cluster API Migration Retrospective: Live migrating hundreds of clusters to Cluster API](./02_clusterapi)
## Other stuff I learned or people i talk to
- Throughout the lunch break I talked to a nice guy who heared my government question during the [Tech is broken and AI won't fix it](./05_broken-tech)-Talk, we talked

View File

@@ -0,0 +1,27 @@
---
title: Project update
weight: 1
tags:
- platform
- cloudnativecon
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/70/Platforms%20WG%20Update%20slides%20-%20Kubecon%20EU%202025.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
An update from the platform working group which will be renamed to the CNCF Platform Engineering Community.
Alongside the new name a bit of restructuring will take place bacause the working group outgrew the working group label.
## Initiatives
### Supported initianives
- Platform Glossary and Whitepaper: What is a platform
- Platform Maturity Model & Assesment: A Platform is a living thing that evolves
- Platform as a Product: Currently in the research stage
- Platform Community Formation: The - above mentioned - restructuring
### Monitored Initiative
- Cloud Native Platform Engineering Associate (CNPA): Certification is being formed
- Cloud Native Platform Engineer (CNPE): Will follow after CNPA

View File

@@ -0,0 +1,30 @@
---
title: Stop building, start delivering workloads
weight: 2
tags:
- platform
- cloudnativecon
- sponsored
---
{{% button href="https://youtu.be/7tbs3J7mgE0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## States of platform
1. Platform is being build and getting delayed
2. Platform finished and not adopted
3. Re-Platforming and guessing if the new platform will meet the same end
4. Platform is low maintainance and devs are happy (nice story bro)
Failure should be fine but it's no longer an option in most cases
## What do we want?
> Whishlist
- Support for all workload
- Consistent experiences across ui, api, cli and gitops
- Pathway from preview to prod
- Multi-cloud and onprem
- Abstract infra

View File

@@ -0,0 +1,32 @@
---
title: "Platform Engineering with a Product Management Mindset: 10x your DevEx"
weight: 3
tags:
- platform
- cloudnativecon
- sponsored
---
{{% button href="https://youtu.be/MFLXFNlmMMI" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
This whole talk is pretty much a product managers view on platform engieering.
## Where can it go wrong
- Assuming customer needs - build for hypothetical developers
- Output > Outcome
- Ignore stakeholder ecosystem
TODO: Steal slide
## PaaP (Platform as a product)
- Anticipate developer needs: Dont just fulfill requests
- Design for all personas and survey related teams
- Prioritize Features according to research themes
- Deliver inremental value with feedback loops
## Hierarchy of goals and baselines
TODO: Copy slide over

View File

@@ -0,0 +1,27 @@
---
title: "The platform Engineer gauntlent: Three defining challenges in the AI era"
weight: 4
tags:
- platform
- cloudnativecon
- sponsored
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Conviciton
- Background: There is an absence of platform leadership
- Reason: Most "leaders" don't push services or features to developers with conviction
- Solution: Be proud and use your leadership role with courage
## Focus
- Focus on developers
- Don't only focus on the production ecosystem (observability, ci/cd) but also the path to this end
## Foundations
- Problem: Many companies are running behind their ai goals thanks to missing baseline automation
- Solution: Embrace the AI

View File

@@ -0,0 +1,13 @@
---
title: "Containerization beyond CPUs - A Kubernetes based serverless platform for ai native applications"
weight: 5
tags:
- platform
- cloudnativecon
- sponsored
---
{{% button href="https://youtu.be/XrMsJIL35Oc" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Hypothesis: We are at the beginning of a 10 year cycle that is moving towards ai-native applications.

View File

@@ -0,0 +1,61 @@
---
title: So you want to hire for platform engineering
weight: 6
tags:
- platform
- cloudnativecon
---
{{% button href="https://youtu.be/cl-MO7j7MHY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Hypothesis: The bar for good interviewing is somewhere near the earth's core and we need to improve this (because we need more engineers)
## Resilience engineering
> The overarching concepts that apply to platforms or just "how to make code work"
Idea: Four main goals that align with different roles unter the mothership "resilience engineering"
- Rebound: SRE
- Robustness: Infra
- Graceful extensibility: Platform Engineering
- Sustained adaptability: DevEx (often pulled out into something else)
Bonus things to look out for
- Intellectual Humility: The ability to learn new things and accepting that you might now much but not everything
- Ecological awe: The awe expereienced when looking at beautiful nature and feeling small or just looking at the cncf landscape
## What do you need for the first team
- People who are able to hire new people and willing to step up to leadership in the long term
- Generalists
## The process and what to do
What should happen before we hire someone (either in one or multiple interviews).
1. Learn about each other
2. Solve a technical problem together
3. Solve a socological problem together
4. How do you and your future coworkers/stakeholders get along
Make sure the end2end time (first interview to ye or no) is low (best is under two meeks)
All of your current engineers should be able to pass the interview without studying in advance (no stupid)
## Potential Failures and fallacies
- The fallacy of demographics in = demographics out
- Treating interviews like hazing
- you don't track afer-hire indicators
- Whireboard interviews: They are stupid repetition and regurgitation and have 0 relations to the real world work
- There are no real studies on how to asses and hire talent
### Flags
- Passion is usually interpreted as "puts up with abuse" and should not be mistaken for caring -> See "Ecological awe"
- Side projects probably indicate lack in family/social time "i make my wife raise the kids" -> Sideprojects are not a good indicator, maybe their are brilliant at their job but love their free time
- A Moneyball-like process (data-driven decision) completely counters how talent is perceived -> Expand the hiring pool to anybody and ignore the clasical "indicators of talent"
- Discriminated demographics probably have a better grip on systems thinking (doe to being forced to make choices)
- Systems thinking is more important than platform knowledge (If you can think in terms of organization and dependencies you can work on platforms)

View File

@@ -0,0 +1,62 @@
---
title: The past, the present and the future of platform engineering
weight: 7
tags:
- platform
- cloudnativecon
- viktor
---
{{% button href="https://youtu.be/uwDoHm-AxTM" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
The good old baseline is "iam an an developer, i write code - now i have to do stuff to continue writing code".
Most developers will continue on to "now i have to write scripts" on order to just do their jobs instead of working on infra.
These scripts evolve to tools which evolve into an internal platform (if everyone starts using it).
Other base components can also feel like platforms (for example application servers).
## The early day evolution
- Hudson
- Docker: Not really building platforms, rather standardized application packaging
- Kubernetes (and Nomad + Swarm): A new concept of scheduling instead of jsut running the application in a container
=> We've been building platforms (or failing to build them) for years and years but now we kinda agree about what a platform is
## Present
We have the base idea of a platform
```mermaid
graph LR
ServiceConsumers-->|Consume through|HTTPAPI-->|Trigger work on|Controllers-->|So|Services
ServiceOwner-->|Manages|Services
```
- The fist question: Do we use public controllers (e.g. the cncf landscape projects) or build our own?
- Result: Mostly a mix starting with public, realizing needs and expanding
## Make it your own
- Goal: Make the platform domain specific for your developers
- Evolution: Tools like DAPR for developers or Crossplane for api-building
- Build the API and Controllers first - dashboard, gitops, observability, ... second
- Remember that kubernetes can manage anything - not just containers
TODO: Steal image
## Blueprints
Take all of the projects you need, combine them and hide the complexity
High level architecture of internal platforms is the same as public ones (aws, ...) but internal and built on kubernetes.
TODO: Steal images for platform blueprints (3 slides)
## Future
- Platform Engineering certification by the CNCF is on the horizon
- Do we need to hide kubernetes from developers? Maybe -> The CNCF is starting groups to get app devs closer to platform engineers
- More multi-cluster specialized tools are sprawling in the last year (scheduling, discovery, management)
- AI things are happening and we should utilize it but not just by calling a llm directly and calling it a day -> e.g. dapr llm abstraction api
- Platforms are not built in isolation, we need to help each other

View File

@@ -0,0 +1,75 @@
---
title: Product thinking for cloud native engineers
weight: 8
tags:
- platform
- cloudnativecon
---
{{% button href="https://youtu.be/8_pB9RAfzrY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/48/Product%20Thinking%20for%20Cloud%20Native%20Engineers%20PlatformEngineeringDay-EU-25.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## How & Why
- IT was a cost center for a long time - not it's critical but still treated as a cost center
- Why is it important: To much focus in the technical aspects instead of value delivery
- Importance: Show the value of your work (which means your work has to provide value)
- Operations and coordination work is not easily visible, but very important
## Principles
- Focus on user value: User problems > Solutions
- Outcome (Value) > Output (Tickets closed)
- Products (lifecycle and ownership) before projects (just setting stuff up)
### User value
- "Who is the user": Builders, Enablers, Regulatory, "Viewers"
- "What is the value": Make the organization more efficient while avoiding risks
## How to start?
![Product compass illustration](../_img/product-compass.png)
### Exploring the Problem Space
Goals:
- Identify top pains
- Build empathy and understanding
- Investigate key business aims
Techinques:
- Customer and stakeholder interviews: Talk to people, they will probably tell you about their pain
- Data/Process analysis: Where are out bottlenecks
- Shadowing: Really see how the day to day works
- Ask "Why"
- Read business updates (current goals)
- Build dashboards that show progress and value
### Defining the problem space
Goals:
- Identify opportunities
- Prioritise
- Gather insignts and data
Techniques:
- Value stream mapping
- RICE, Value vs Effort or ather cost benefit analysis
- Analyse your exploration process
## Did we reach our goal?
### Product metrics
- Someone will measure your work, hope they do it right or rather do it yourself to show how you provide value
- Product metrics should measure outcome not output (or performance metrics)
- Baseline: You need to know the desired outcome
### Frameworks
- DevEx: Triangle of flow state (build&test speed), feedback loops () and cognitive load (code complexity, docs clarity)
- DORA
- SPACE
- DX Core 4

View File

@@ -0,0 +1,129 @@
---
title: A million ways to promote changes between environments
weight: 9
tags:
- argo
- cloudnativecon
- viktor
---
{{% button href="https://youtu.be/iCTgRC3AQQk" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Baseline
- Promotion: Move things from one env to another
- Options: Sequentially or both
- Challenge: Env differences
- Challenge: How do we link our promotion tasks?
### GitOps
- Declarative: YAML, JSON, XML (Not helm or kcl or anything else)
- Versioned and immutable: Git
- Pulled automatiocally: No wirte access from cluster
- Continously reconciled: Maintain parity between desired and actual state
### Rules
- Part of SLDC
- Declarative
- Versioned and immutable
- Pulled automatiocally
- Continously reconciled
## Workflows
### Manual
1. Deploy
2. Run tests
3. Push to next stage
4. Test again or roll back
### Manual with gitops
1. Update manifest
2. Push to git
3. Test
4. Next stage
Problem: Eventual consistency makes the process async instead of sync (important for tests)
### Generic workflows
1. Dev: Bump, push
2. QS: Wait for success of 1 (how?), do the same
3. Prod: Wait for success of 2 (how?)
TODO: Steal code screenshots from slides
## Tools
### Extend your standard CI
Not async, risk of flapping, either blindly trust the state or break the pull-principle by running argo sync or kubectl apply
### AppSets Progressive Sync
- Built in to Application Sets (alpha)
- Targeting by label, promotes everything
- Not supported with autosync, bechause it basically manually triggers sync one after another
- Changes from git have to be manually triggered
### Image updater
- Subscribe to semver based image updates and write them to kubernetes and/or git
- You have to implement promotions via image naming schemes
TODO: Steal flowchart
### Kargo
- Freight: Artifact or manifest versions to promote
- Stage: ArgoCD Apps
TODO: Steal flowchart
### Telefonistka
- IaC Agnostic tooling
- Idea: Watch folder contents and copy contents to new folder
- Pretty mutch a bundeled CI-Script
TODO: Draw your own chart
### Codefresh GitOps
> This is one of the speaker's tools
- Product: Applications with relationships
- Env: Any cluster and/or namespace
- Promotion: CRD for policy (when does it happen, what get's validated)
- Promotions can happen manually or automated via commit/pr
- BAsed on argo workflows
### GitOps Promoter (Intuit)
- Define Manifests once and hydrate them later
- Sourcehydrator: Argocd feature that handels the rendering and commits it to a new dedicated branch (one branch per stage)
- The Branches are the branches used by argo, e.g. `environments/dev` get's watched by the dev cluster
- Changes result in environment proposal branches, PR get's oppened, PR checks run, when PR requirements are met (Tests), it will merge them into the real env branches
TODO: Steal Pattern
## Overview of the philosopies
Artifact Oriented: Imageupdater, Kargo
Define Manifests once: AppSets Progessive Sync, GitOps Promoter
Deff and workflow: CI, Codefresh
TODO: Steal from slides
## Best practives
- Can you recover from git at any point? No -> Do better
- Does git reflect what's deployed without looking?
- Does this enable SDLC?
- Interfaces in folders, not branches? -> Branches may get crowded

View File

@@ -0,0 +1,89 @@
---
title: "Platform abstractions: Asset or liability"
weight: 10
tags:
- platform
- cloudnativecon
---
{{% button href="https://youtu.be/M5X5NCzlzIA" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/52/atul-talk-platform-engineering-kubecon-london-2025_final.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
Fair warning: Food analogies incoming
## Baseline
### What do abstractions achive
- Structure through simplification
- Complexity made simple
- Hiden Details, visible value
### Dilemma
1. Platform team creates abstraction
2. Abstraction works for 10 Teams
3. Other team requests extension
4. Question: How do we deal with this
### Possible Solutions
- Add Config Options: Increases complexity of abstraction
- Make One-off exceptions: Breaks standardization, introduces inconsistency
- Require conformity: Hinders innovation, creates enemies
- Allow bypassing: Creates shadow it, risking security and resource control
=> Debt trap: The cost of maintaining a stable platform rises and rises
## The debt cycle
### The abstraction cycle
1. Simplify
2. Adobt
3. New Requirements
4. Add complexity
5. Repeat
![Abstraction cycle illustration](../_img/abstraction-cycle.png)
### Warning signs
- Rizing customization requests
- Workarounds
- Shadow IT
### Impact
- Each new feature becomes harder to implement
- Teams lose trust in the platform capabilities
- Platform evolutions slows down
- New tech is difficult to incorporate
## Abstraction elacity
> The abstraction should stretch a bit to accommodate change without brakuing
- Adaptability: Ease of handling new requirements
- Transparency: Understand what your user wants and why
- Extension PAtterns: Document ways to customize the platform behavior
- Migration Paths: Ease of moving away from the platform abstraction
### Elasticity
- Can teams access lower level controls (when needed) while staying with the abstraction
- Do users understand what happens underneath (when needed)
- Are ther documented extension/customization points?
## Patterns to break the debt trap
- Layered abstraction patterns: start with low-level abstractions that get abstracted on higher levels to allow users to choose the right abstraction level for themselves without having to configure everything themselfes
- Expert-ap: Additional api parameters that are not needed but can be set
- Policy based guard rails: Change the guardrails based on the environment (e.g. deep access in dev, not in prod)
## The end goal
- Increase adoption
- Eliminate shadow IT
- Improved satisfaction
- Reduced overhead

43
content/day0/11_t-env.md Normal file
View File

@@ -0,0 +1,43 @@
---
title: "The story of t-env: Scaling a platform to impriove the volocity of hundreds of developers"
weight: 11
tags:
- platform
- cloudnativecon
---
{{% button href="https://youtu.be/qXRHpIYxU_c" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/da/KubeCon%20Talk_%20Lemonade%27s%20t-env.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
Okteto: Ephemeral environents for testing
## History
- Starting point: Local Dev -> Setup for new devices or devs is realy slow (on average 10hrs a week)
- Next Idea: EC2 Instances with a fancy docker-compose and scripts -> No more local dev
- Problems: Still complex - just in the cloud, manual updates, allways-on required (no working in the train)
- Risks: Developers will just create workarounds and shadow it
## T-Env
- Baseline: Setup an environment on kubernetes for each dev with ci/cd
- Okteto: A single command to enter dev mode `t dev start` with file sync from local
- Implementation: Wrapper arount the okteto cli
- Why: Becaus dev seems to love the cli
- Self service observability for troubleshooting in your env
Used Open soruce Tools: Pulumi, Grafana, Okteto, K8s
### Did it work?
- The time to test is way faster
- The path was clear
- The environments should be ephemeral but devs don't like that -> They decided to allow for long lived envs
- Cloud cost is relatively high with long living envs -> They implemented a sleep system based on dev timezone
(or manual wake-up)
## The futuuuuure
- The company is not getting smaller -> More devs annd more services
- AI agents will write some of the code in the future
- Idea: Only run modified code in env instead of everything

View File

@@ -0,0 +1,50 @@
---
title: "Perfomance preseverance: Taming 1000 kubernetes clusters"
weight: 12
tags:
- platform
- cloudnativecon
---
{{% button href="https://youtu.be/ZTT8M74RD1M" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/d5/kubecon_2025_v4.2.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## History
- They started with upstream kubernetes - the hard way
- Env grew to over 200 prod apps
- Pains: Single Cluster, single point of failure and complexity
- What worked: Dev adoption and autonomy, no vendor
## Challenges
> Based on stakeholder expectations
- One tenant per cluster -> Over 1000 Clusters
- Release management
- Small team (3 Engineers)
## Guiding principles
- Platform as a product
- Stability: trust
- Standardization -> Scalability and inter team collab
- Day 2 support
- Dogfooding
## Tenancy
- One cluster per product
- Own CLI, devs like cli
- Custom operator and crds
## Stack
- Keopsctl? Pretty much their own cluster operator
- A Simple Cluster CRD
## Migration
1. Build trust in platform
2. Support with docs, oboarding, q&a
3. Co-create with devs while keeping an eye on day2 -> Feature-Flag based rollout

56
content/day0/13_paap.md Normal file
View File

@@ -0,0 +1,56 @@
---
title: Platform as a Product
weight: 13
tags:
- platform
- cloudnativecon
---
{{% button href="https://youtu.be/DoiaHfl9Y7Y" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
The CNCF's research into product thinking for platforms.
## But why
- Get insights into the current product thinking practives of platform builders
- Topics: Needs/Paintpoints/Behaviour
- Target: Create personas based on insights
- Find out what people are doing, not hew they are doing
## How?
- Survey for quantity
- Interviews for quality
## Challenges
- Asking questions without sugessting answers
- Consensus on research goals
- Motivation and time investment (on interviewer and interviewee side) + Non-Responses
- Toolsing: There is no standard tooling at the CNCF for this kind of research
- Small sample size -> No real research insights, just signals/hints
## Analysis
- Working with assumptions was hard in combination with the small sample size
- Survey: Survey Tool (Google Forms) combined with a whiteboard tool for clustering and analysis
- Interviews: They used ai for time efficiency but the prompt escalated a bit leading to no real time gain -> But you can scale the same prompt to infinite sample sized
- Challemnge: AI confidently churns out wrong answers -> Use source links to verify and scoping
TODO: Steal worklow from slides
## Outcome/Signals
- Platform Orgs use Prioritization Frameworks onconsciously: "We don't use product management and tools like that" -> Well you do, you just don't call it PM and are a bit unstructured
- Structured Activities: Interviews (talking to each other), Focus groups, quantitative data, ...
- Roadmap influence: Insight, prioritization, painpoints, backlogs
- Regular planning meetings
- Platform orgs struggle to define and actually implement measures of success: Measure activity over impact, success is often felt instead of proved
- Platform teams have varied control over their work: Depndening on company size and business relationships
## Future
- Baseline: They have some signals
- Question: Are these pattern successfull
- Needed: More data and better organization

58
content/day0/14_lego.md Normal file
View File

@@ -0,0 +1,58 @@
---
title: Building Platforms with empathy and yaml at the lego group
weight: 14
tags:
- platform
- cloudnativecon
---
{{% button href="https://youtu.be/8FmJWd7vRt4" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Very nice kids playing with lego intro analogy about creativity, sharing and colaboration.
## The golden brick
- The brick could get picked up and sometimes picking it up is mandatory
- Devemopment in close colab and trust with users
- Focus on good enough instead of perfect but everyone is unhapy
### Guidelines
- API first: Define a speration beween users and services with abstractions
- Self services: Freedom of choice and combination
- Constraints that are soft and can be modified on feedback
### Offers
- Kubernetes as a service
- Runtime as a Service: NAmespace as a service with argo and without cluster access
- Problem: Users want kubeapi access
- Method: Talk with the users
- Solution: Zero Trust proxy that provides operational access to kubeapi via OIDC
- There are multiple APIs that can be combined -> You need constraints
### What's needed
- Conversation
- Trust
- Striking a balance
## The human aspect
- Treat people as colleagues instead of customers
- Build empathy to reach a ballanced "good enough"
- Lead with transparency: Publish your metrics
- Visit their context
- Explore unknowns together
- Create a shared understanding of challenges
### Team culture
- Know who you are helping an who helps you
- Empower them to shine by getting to know their context
- Hear them out in small meetings ore in person
## Platform maturity
TODO: Steal maturity chart

View File

@@ -0,0 +1,29 @@
---
title: 10 Quick tips on how to internally market your platform
weight: 15
tags:
- platform
- cloudnativecon
- lightning
---
{{% button href="https://youtu.be/kiUV8En8Co4" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/42/2025-PE-Day-10-Tips.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Baseline
- Event great tech does not sell itself - you need marketing
- We don't have a big marketing budget for our internal platform
- No adoption -> No Trust -> No new users -> No adoption
## Tips
- Define personas and a value proposition map
- Build a brand: Name, logo, story, swag
- Have a launch party or milestone parties
- Provide clear accesible communication (with clear channels, docs, ...)
- Build a commmunity that can help each other (and don't seperate yourself from the community)
- Capture metrics for success for yourself and from a user's perspective
- Provide a 5minute Wow-Moment/demo werhe the user can feel like they achived something
- Level up with gamification
- Leverage external events for internal visibility

Binary file not shown.

After

Width:  |  Height:  |  Size: 572 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 270 KiB

View File

@@ -4,8 +4,27 @@ title: Day 0
weight: 4
---
TODO:
Day 0 of KubeCon aka CloudNativeCon aka the day on which the co-located events happen.
This year I spent most of my time at the platform engineering day (with a short visit to argocon).
The emerging motto of platform engineering day was "platform as a product".
This was the third conference day (fourth travel day) and in the afternoon i started to feel the brain-overflow.
But powewring through I ended up attending two keynotes (no notes, they were pretty much a welcome and goodbye) and 14 talks.
And most importantly: This is the day my friends an coworkers joined (they are only in town for kubecon, not for rejekts).
Sometimes we ended up in the same talks, sometimes in different talks which lead to a rich set of talk notes.
## Talk recommendations
* TODO:
- How to design a good hireing process: [So you want to hire for platform engineering](./06_hire-engineers)
- Evolution of Platforms and Platform Engineering: [The past, the present and the future of platform engineering](./07_past-present-future)
- How to design a good product: [Product thinking for cloud native engineers](./08_product-thinking)
- Staging with gitops: [A million ways to promote changes between environments](./09_promotions)
- How to handle abstractions and new requriements: [Platform abstractions: Asset or liability](./10_abstractions)
- Very nice slides: [Building Platforms with empathy and yaml at the lego group](./14_lego)
## Other stuff I learned or people i talk to
- Talked to the Vultr people - they have a manifesto for ai with amd and nvidia gpus
- Talked to Meshcloud: They build developer platform tooling (currently mostly integrated with cloud providers)
- Want to look into Okteto for dev envs: <https://github.com/okteto/okteto>

View File

@@ -0,0 +1,77 @@
---
title: Scaling GPU Clusters without melting down
weight: 1
tags:
- ml
- nvidia
- ai
- apiserver
- go
- kubecon
---
{{% button href="https://youtu.be/dUfp3j1j-mg" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/50/Scaling%20GPU%20Clusters%20Without%20Melting%20Down%21%20%281%29.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Baseline
- We need mroe and more gpus -> Control Plane needs to keep track of more objects
- Goal: Scale Workers without scaling control plane
## Current Problems
### Secret list calls go up and control plane goes down
- Scenario: High number of list calls with larger secrets
- Problem: OOM apiserver b/c cache
- Fix: API Priority & Fairness (only allow two concurrent list calls, queue the rest)
- Result: Decreased number of oom crashes
### High memory usage until we restart the apiserver
- Scenario: API-Server frees up to 40% of it's memory util when restarted
- Main suspect: Memory collection
- Idea: Tune GOGC (ENV Var `GOCC`) -> They set the default 100 to 50
- Result: Decrease in memory util and no more growing util over time
### Large skew in memory utilization
- Scanario: Scew between api server memory utilization across api-server pods
- Problem: If a pod with high util get's hist with a list, the api-server will oom -> The LB redirects to the other 2 -> Those OOM
- Observation: The lb in fron of the api server pods also shows some skew -> Explains the skew
- Root cause: lb has long living tcp connections to the servers and balances based on connections and not requests
- Idea: Switch up the lb configuration -> Not quite the right angle
- Fix: Goaway-chance param in apiserver - random `COAWAY TCP` message get's sent -> Tearing down connection gracefully, recreate connection
### Architectural mistakes
- Large number of secrets per workload -> List, Encode/Decode overhead
- No caching -> To many list calls
### Preview
- There are a bunch of sig api-machinery improvements planned
## The future
- The switch from NUMA GPU-Devices to DRA
- DRA is powerfull engough to get rid of custom numa stuff
### The stack
- Currently:
- CP: APIServer, Controller manager, Scheduler and Topology aware scheduler
- Worker: Device Plugin, nfd topology updater
- Future
- CP: APIServer, Controller manager, Scheduler
- Worker: Device Plugin
### Testing scaling
- Tool: KWOK (Kubernetes WithOut Kublet) - used to simulate gpu workout
- Env: K8S 1.32 with scaling from 0 to 4000 Workloads
- Metrics:
- Scheduling Latency: Topo aware was way more latency-affected
- Scheduler Memory util: 30% of memory saved with dra
- APi-Server Memory: Another 20& of memory saved
- Result: They are confident that DRA will bew stable and even save memeory and cpu util

View File

@@ -0,0 +1,81 @@
---
title: Day 2000 - Migrating from kubeadm + ansible to clusterapi+talos
weight: 2
tags:
- kubecon
- platform
---
{{% button href="https://youtu.be/uQ_WN1kuDo0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/fd/day2000-migration-ClusterAPI-talos.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Background
- They use large, shared clusters
- The oldest cluster is 2099 days (5,8 years) old
- Onprem hosted on vSphere with vanilla kubeadm
- Fun fact: They run chaosmonkey on all clusters -> Automaticly prepares for updates
### Legacy provisioning
1. Terraform create debian vm
2. Deploy base tools with puppet
3. Register nodes in inventory yaml file
4. run ansible playbook -> Renders configs and runs kubeadm
5. Configure ArgoCD
### Target
- Use Clusterapi to manage the workload-clusters
- Basic CRDS: Cluster, MachineDeployment, Machine
- Talos: Immutable, minimal, ephemeral with declarative config via grpc api
![CAPI Diagram](../_img/capi.png)
## Migration
1. Config matching between kubeadm and talos+capi
2. Import PKI/Certs
3. Create ClusterAPI CRDs
4. Add ClusterAPI Nodes
5. Remove kubeadm nodes
### 1. Config matching
1. Serviceaccount Issuer: Talos has it's own default
2. etcd encryption key names are hardcoded in talos
3. Re-Encrypt all secrets (get secrets, replace secrets)
### 2. PKI
1. Talos includes some logic that can generate a secrets bundle from an existing API
2. Import: The etcd, k8s, serviceaccount and os (talos specific, used for the talos api auth) certificates
### 3. CRDs
- One namespace per workload cluster
- Cluster-CRD: Ref to CP and Infrastructure
- ControlPlane-CRD: Create cp MDs
- Infrastructure: References template for wokrer-MDs
![ClusterAPI CRDs](../_img/clusterapi-crd.png)
### 4. Add ClusterAPI Nodes
- Add new CP and Worker Nodes to the cluster that are managed by CAPI (slowly, stuff will break)
- Remove the old nodes one by one over weeks ore months
- Potential Problems:
- Mismatched serviceaccountissuer
- Missing etcd encryption key
- Wrong etcd encryption key
- Loss of quorum: `--force-new-cluster` can force recovery on one node of the etcd cluster
## Demo
I reccomend watching the demo
Talos seems pretty cool.
## Bootstrapping
- Kind cluster in github action or on local device

View File

@@ -0,0 +1,79 @@
---
title: "Don't write controllers like charlie don't does: Avoiding common kubernetes controller mistakes"
weight: 3
tags:
- kubecon
- operator
---
{{% button href="https://youtu.be/tnSraS9JqZ8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/53/Don%27t%20write%20controllers%20like%20Charlie%20Don%27t%20does_%20avoiding%20common%20Kubernetes%20controller%20mistakes.pptx.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Common mistake
### Not using a simple client but directly talk to the api server
- Problem: A
- Problem: Updates send in the whole object -> Noop updates waste apiserver resources
- Fix: Use a cache client
- Problem: Caching validation
### Don't use custom caching
- Problem: Good Luck dealing with concurrency
- Hard: Controllers mus maintain a per kind cache
- Problem: Eventual consistency makes everything more complicated
- Fix: Use a framework
### Predecates only apply to the current
- If you have a predecate in the for (predecate) only appy to this call, not to other watchers
- Also check if you shold be reconciling your low-level object or reconciling the higher level ones that ref to them is better
## Tools
### KRT
> Still under development
- Operatorions in collections (kubernetes objects with state tracking)
- Fetch function that handels transformation
### StateDB
- In-memory database for go with watch channels
- You can setup a table that stores all objects of a kind (provided by the client)
- Triggers hooks when changes happen in the database that you can react to
### Controller-Runtime
> The kubebuilder one
- Includes a chached client
- Works on the reconciler pattern -> Makes triggers simpe
## Tips
- Limit the number of api server updates
- Check for dif yourself and don't send updates if there is nothing new
- Use patch instead of update just with changed fields -> Especially for `.status`
- Use a framework that handles watching, coalescing and caching (krt, statedb, controller-runtime)
- Use predecates if you're using controller-runtime, this helps you filter out no-op events by checking them against the cache and filters
## Q&A
- Do you know where your reconciliations are coming from:
- Counts: Yes the frameworks provide metrics and you can implement your own
- But controller runtime abstracts the patch source so you have to compare before and after state yourself - but you should not do that
- What about state sharing across multiple threads?
- Controller runtime handels each reconcile as idempotent, so you can just multithread
- But handling consistency can still be hard because you have to design all of your operations as idempotent by rebuilding the state each time
- What are your thoughts on controllers that do stuff in the real world (especially b/c it takes longer and there are no natie observers)
- Do something like the krt project by keeping the state seperatly
- What if someone changes things at the cloud provider
- A question of philosophy -> Usually just treat the operator at the source of throuth
- How do you test your operators?
- Depends on your output (kubernetes objects make stuf simple)
- For cilium: Simple b/c it's just creating kubernetes projects
- With oputside interaction: In-memory state representation or mocking
- For complex controllers split the operator into: Ingestion, data model and transformation

View File

@@ -0,0 +1,56 @@
---
title: The GPUs on the bus go round and round
weight: 4
tags:
- kubecon
- gpu
- nvidia
---
{{% button href="https://youtu.be/cLJRh4y4vXg" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Background
- They are the GForce Now folks
- Large fleet of clusters all over the world (60.000+ GPUs)
- They use kubevirt to pass through GPUs (vfio driver) or vGPUs
- Devices fail from time to time
- Sometimes failures needs restarts
## Failure discovery
- Goal: Maintain capacity
- Failure reasons: Overheating, insufficient power, driver issues, hardware faults, ...
- Problem: They only detected failure by detecting capacity decreasing or not being able to switch drivers
- Fix: First detect failure, then remidiate
- GPU Problem detector as part of their internal device plugin
- Node Problem detector -> triggers remediation through maintainance
## Remidiation approaches
- Reboot: Works every time, but has workload related downsides -> Legit solutiom, but drain can take very long
- Discovery of remidiation loops -> Too many reboots indicate something being not quite right
- Optimized drain: Prioritize draining of nodes with failed devices before other maintainance
- The current workflow is: Reboot (automated) -> Power cycle (automated) -> Rebuild Node (automated) -> Manual intervention / RMA
## Prevention
> Problems should not affect workload
- Healthchecks with alerts
- Firmware & Driver updates
- Thermal & Powermanagement
## Future Challenges
- What if a high density with 8 GPUs has one failure?
- What is an acceptable rate of working to broken GPUs per Node
- If there is a problematic node that has to be rebooted every couple of days should the scheduler avoid thus node?
## Q&A
- Are there any plans to opensource the gpu problem detection: We could certainly do it, not on the roadmap r/n
- Are the failure rates representative and what is counted as failure:
- Failure is not being able to run a workload on a node (could be hardware or driver failure)
- The failure rate is 0,6% but the affected capacity is 1,2% (with 2 GPUs per node)

View File

@@ -0,0 +1,64 @@
---
title: "Reliable k8s resource Submission & Bookkeeping"
weight: 5
tags:
- kubecon
- platform
---
{{% button href="https://youtu.be/NCkHrvqFMl8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/0d/Reliable%20K8S%20Resource%20Submission%20and%20Bookkeeping.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Service offerings
- Product: HA Container Platform for general utility with a focus on run-to-complete
- Use-Cases: ML Orchestration, CI/CD, Machine maintainace, Financial analysis, Data Processing pipeline
- Requirements: Observability, Scheduling Events, Approval process, Bookkeeping, Datacenter Reseliency
- Focus: Resiliency (HA with datacenter failover)
- What the user needs: Workflow (e.g. generate report, persist report, notify)
- What we need for the user: ConfigMaps + Secrets, Workflow templates for the steps
## Challenges
- Read after modify across multiople datacenters
- Many reads against kubeapi that could overload the apiserver
- No native approval flows and limited audit
## Submission flows from a users perspective
### Submission of runnables
- User: Submits runnable to subnitter with audit
- Submitter: Handels retry, verification, ...
- Submitter: Configures workload on workload clusters
![](../_img/runnables.png)
### Submission of deployables
- User: deploys mutation to audit/sourceoftrough
- Syncer: Syncs deployables to workload clusters
![](../_img/deployables.png)
## Reporting
- User wants: UI with latest status for all jobs
- Compliance wants: Transactions on given resource for auditing
- Implementation: Highly available inventory as single source of truth
```mermaid
graph
WorkflowAPI-->|reads|inventory
Consumer-->|updates|inventory
Producer-->|publishes events to|Consumer
```
### Potential Problems
- Problem: Delete event does not get propagated from syncer to producer leading to zombie ressources
- Fix: Periodic Cleanup
### Overview
![Complete diagram](../_img/submission.png)

BIN
content/day1/_img/capi.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 220 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 266 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 297 KiB

View File

@@ -4,8 +4,26 @@ title: Day 1
weight: 5
---
TODO:
Day 1 of the main KubeCon event startet with a bunch of keynotes from the cncf themselfes (anouncing the next locations for kubecon - amsterdam and barcelona).
The also announced a new sovereign cloud edge initiative (CNCF/LF meets EU and soem german ministry) called "NeoNephos" with members like SAP, StackIt or T-Systems.
This is also the day the sponsor showcase opened - so expect more talking to people and meetings or demos and less straight up talks.
## Talk recommendations
* TODO:
- Not that much about gpus with good control plane scaling advice: [Scaling GPU Clusters without melting down](./01_scaling-gpu)
- Migrate a cluster to ClusterAPI without downtime: [Day 2000 - Migrating from kubeadm + ansible to clusterapi+talos](./02_migrations)
- Some basic operator tips with good Q&A questions: [Don't write controllers like charlie don't does: Avoiding common kubernetes controller mistakes](./03_operator-mistakes)
## Other stuff I learned or people i talk to
- The crossplane maintainers (Upbound)
- Anynines
- Cloudfoundry/Korifi
- FlatCar
- Cert-Manager
- Flux maintainers
- OVH
- Kubermatic
- Isovalent
- Spacelift: They employ some of the opentofu core maintainers

View File

@@ -0,0 +1,38 @@
---
title: "Cloudy with a chance of kubernetes"
weight: 1
tags:
- kubecon
- platform
---
{{% button href="https://youtu.be/iCAFXF5ECto" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/bc/KubeCon%20EU%202025%20-%20Cloudy%20with%20a%20chance%20of%20Kubernetes_%20Going%20from%20one%20to%20three%20cloud%20providers%20-%20Laurent%20Bernaille%20%26%20Maxime%20Visonneau,%20Datadog.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Background
- Scale: 100s of clusters
- Cloud: Azure, AWS, GCP
- The baseline: Single AWS Region and applications on vms
- Goal: Operate on different locations
- History: They added more and more regions - 6 Providers in 6 Regions across 29 locations
- Problem: Different tooling across different cloud providers
- Idea: Kubernetes abstracts the specific cloud provider infra
## The way
- Idea: Use managed kubernetes
- Problem: In 2018 the managed offerings were in beta or very limited
- Challenge: Opinionated cloud specific stuff
### Iterations
1. Clusters based on vms created by terraform and other automation tools -> They realized that they need multiple clusters per region
2. Their own application delivery platform that deployed to the right clusters across regions for better DevEx
3. k8s on k8s (hosted cp) -> Current setup with a terraform managed parent cluster
4. Idea: Host the Partent-Cluster on managed kubernetes -> They need to abstract some things away
5. Solution: Use their good old aplication delivery platform
### Abstractions
- Use custom CRDs to abstract the same behaviour across providers

View File

@@ -4,8 +4,21 @@ title: Day 2
weight: 6
---
TODO:
The second day of kubecon was my main "meeting day" this year - aka there were a bunch of scheduled meetings with manufacturers, partners, potential partners or just to get to know someone/a project.
What does this mean for you? Another day with only a few sessions (I only managed to attend two and only one was worthy of note taking) - the meeting notes are not available online.
## Talk recommendations
In the evening we attended the "German Community Stammtisch".
* TODO:
## Other stuff I learned or people i talk to
- Isovalent
- Kubermatic
- Portworx
- Fastly
- Syseleven
- Netbird
- VMware
- Stackit
- Harness
- Mia Platform
- and many, many more...

View File

@@ -0,0 +1,53 @@
---
title: "Surviving Day2: Picking the right tool to secure your kubernetes habitat"
weight: 1
tags:
- kubecon
- security
---
{{% button href="https://youtu.be/FqUPqroF-Rw" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/a1/Surviving%20Day2%20-%20Picking%20the%20Right%20Tool%20To%20Secure%20Your%20Kubernetes%20Habitat.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
Premise: The CNCF landscape includes a huuuge number (80+) of security(related) projects.
Analogy: Animal kingdom (includes simmilar-ish animals that might do some of the same stuff but not entirely the same)
## Build Phase
- How can i scan my container for vulnerabilities? -> Well you probably mean your image
- The image itself is just a bunch of static layerns and we kinda have to trust the layers you didn't build yourself
- The main tool used is still trivy with some easy steps
1. Extract layers
2. Build FS
3. Identify OS and Non-OS Packages
4. Compare with vuln-db
- The animal in our analogy: Racoon
## Deploy Phase
- Kubernetes Native: Admission Controller
- Tool used: Kyverno (integrates as an admission controller with yaml/crd based configuration)
1. Modify (e.g. add default resource limits)
2. Validate (check policies)
- The animal is actually a human: The forrest guard
## Start Phase
- Before the pod itself is running CSI, CNI and secret related processes (the once we want to look into) happen
- Problems: Secrets have no rotation or versioning mechanism, there is no default integration for external kms
- Project: External Secrets -> Get secrets from external kms, automaticly sync (e.g. new versions)
- The chosen animal: Capricorn
## Run Phase
- Goal: Runtime scannning without including specialized instrumentation in each application
- Tool: Falco utilizing eBPF to check system calls against rules
- Idea: Detect dangerous behaviour (e.g. check for someone trying to exploit a fresh CVE)
- The analogy: Falcon
## TL;DR
1. Scan images (trivy)
2. Enforce best pracices (kyverno)
3. Use an external kms (external secrets)
4. Scan at runtime (falco)

View File

@@ -0,0 +1,30 @@
---
title: "Type-safe feature flagging in openfeature: Lessons learned from using feature flags at google"
weight: 2
tags:
- kubecon
- dev
---
{{% button href="https://youtu.be/mewXGSwDCE4" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/f6/Type-safe%20Feature%20Flagging%20in%20OpenFeature.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Featureflags?
- Idea: Change the behaviour of an application without rebuilding it
- Goal: Control rollout, reduce risk, experiment (a/b)
- At google: A huge number of feature flags (150k+) but that's because people forget to turn them off
## Where does the flag come from
- Lifecycle of a flag: Create, Manage, Deprecate, Delete -> But will it be created frist in code or in the service
- Classic implementation: Just a if/else that uses a function to get the flag
- Problem: What if the flag names missmatch between the code and flag ser -> Muliple sources of truth
- Solution: Require use of auto-generated flag bindings (codegen from the management system) to mitigate typos, etc.
## OpenFeature
- Goal: Vendor agnostic, standardized, open source
- Basic setup: Register provider (once per app), create a client, use client to get flags
- CLI: Integrate into management system, keep a local manifest of all flags and generate code (generates the client)
- Now: Just call the client's method instead of hard-coding feature flag names

View File

@@ -0,0 +1,43 @@
---
title: "Don't let your kubernetes cluster go wild: Ensuring etcd reliability"
weight: 3
tags:
- kubecon
- etcd
---
{{% button href="https://youtu.be/J93U9n_qxSI" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Fair warning: This talk was very technical and pretty interesing - but don't even try to understand it if you're tired (or if it's the thrid to last session on the last day of a long conference).
## Baseline
- Standard example: Write and read KV-Data, `put(A,2) -> Get (A)`
- Problem: Concurrency
TODO: Steal image from intuition of correctness
## Correctness
- Correctness: Kinda funky when it comes to time
- Fix: Define serialization that executes parallel request one after another to bring them in an order
## Failures
- What happens is connections between etcd nodes go down -> Serving stale data
- What happens if data corrupts -> If enough members are online, it can repair itself
- And many more that can happen at random times -> Hard to test
TODO: Steal "in a concurrent world"
## Robustness framework
- Automates tests for failures
- Includes reliable reproductions of past (seamingly random) errors
- Currently a mixture of existing go debugging tools
## Future
- Reproduce more bugs consistently
- Run additional consistency checks

View File

@@ -4,8 +4,15 @@ title: Day 3
weight: 7
---
TODO:
The last day of KubeCon - aka the day everone leaves early.
But not me and I had no meetings scheduled for this day -> More talks for me and notes for you.
This being my 7th day of the trip and 6th day of non-stop conferences took a bit of a toll on my note taking skills (expect more spelling mistakes).
## Talk recommendations
* TODO:
- Intro to feature flags and related tips: [Type-safe feature flagging in openfeature: Lessons learned from using feature flags at google](./02_open-feature)
## Other stuff I learned or people i talk to
- TODO:

View File

@@ -4,4 +4,6 @@ title: Lessons Learned
weight: 8
---
Not related to any talk directly, but i can recommend this [Blog Post](https://smudge.ai/blog/ratelimit-algorithms) and [Video](https://www.youtube.com/watch?v=8QyygfIloMc&) about rate limiting.
TODO: