Compare commits
65 Commits
6931da118c
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| b9060af72d | |||
| 3afb07e4c1 | |||
| 4becb06ad3 | |||
| 0e24bf4fd6 | |||
| f06c486182 | |||
| f71971e793 | |||
| a7a3817a03 | |||
| 47f7869257 | |||
|
b2fd7a4c81
|
|||
|
1213be7c30
|
|||
|
1f49a42edc
|
|||
|
c6f716ced1
|
|||
|
09ac5a9051
|
|||
|
5ed623d0ca
|
|||
| f8ca21416b | |||
| dc4dd2d883 | |||
| 957bc94344 | |||
| 44a3653c84 | |||
| 6bf47e49c5 | |||
| 39d92acdb4 | |||
| 4d528bf5de | |||
| d2f3f5f95d | |||
| 6d0c95a8ac | |||
| 3e4fbb616b | |||
| d9605d602e | |||
| 745e8f5896 | |||
| 78ca5973b8 | |||
| 77f34ed1ab | |||
| a36f562cf4 | |||
| 9ad9af0f9c | |||
| 4f39c1102c | |||
| df93624814 | |||
| 46b06c66fd | |||
| b4d8aa29c3 | |||
| 4cec1917bf | |||
| bd7d9fe87d | |||
| f4858d81a8 | |||
| bfcfe88cea | |||
| 45a26383e0 | |||
| 8dbdfd938f | |||
| 8941108720 | |||
| f8512dc6ae | |||
| c09bf8f637 | |||
| d90d5b8eab | |||
| 8b78108a60 | |||
| d09e3ff3d1 | |||
| 8ddf87d2f4 | |||
| 720d68803d | |||
| f0229abafd | |||
| 723051c498 | |||
| 7e6d0fc47f | |||
| fe8fa9693a | |||
| 8aab9217fe | |||
| 936a4c8c3a | |||
| cc5325bf3f | |||
| 30a976bb75 | |||
| 88200c76df | |||
| e608712f31 | |||
| ed77238254 | |||
| 80f62fd567 | |||
| 17b4407fea | |||
| cb8d7f9d48 | |||
| b3a8b29556 | |||
| 52b967f78c | |||
| c19d8a7f42 |
@@ -1,4 +1,4 @@
|
||||
FROM registry.odit.services/hub/hugomods/hugo:exts AS build
|
||||
FROM registry.odit.services/hub/hugomods/hugo:exts-0.145.0 AS build
|
||||
WORKDIR /app
|
||||
|
||||
COPY . /app/
|
||||
|
||||
@@ -6,5 +6,6 @@ tags:
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
TODO:
|
||||
@@ -9,11 +9,38 @@ This current version is probably full of typos - will fix later. This is what ty
|
||||
|
||||
## How did I get there?
|
||||
|
||||
I attended KubeCon + CloudNativeCon Europe 2025 in London.
|
||||
I attended Cloud Native Rejekts and KubeCon + CloudNativeCon Europe 2025 in London.
|
||||
This year I was sent there by my employer [DATEV eG](https://datev.de) - thanks again to everyone who helped me with getting this trip approved (you know who you are).
|
||||
|
||||
Why? Because learning about all new things in the world of cloud is really important and war stories help to avoid mistakes that other's already made.
|
||||
And [last year's experience](https://kubecon24.nicolai-ort.com) was really good, so I wanted to go again.
|
||||
|
||||
Plus I actually presented a talk at Cloud Native Rejekts 🥳.
|
||||
|
||||
## And how does this website get it's content
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
Nicolai<-->|Watches|Talk
|
||||
Nicolai-->|"Takes notes (and typos) + commits"|Repo
|
||||
Repo-->|Triggers|Actions
|
||||
Actions-->|Builds image and pushes to|Registry
|
||||
Flux-->|Detects new image|Registry
|
||||
Flux-->|Rolls out new image|Kubernetes
|
||||
```
|
||||
|
||||
## Changelog™️
|
||||
|
||||
- 2025-03-28: Inital repo and deployment setup
|
||||
- 2025-03-30: First day of Cloud Native Rejekts
|
||||
- 2025-03-31: Second day of Cloud Native Rejekts
|
||||
- 2025-04-01: First day of KubeCon/CloudNativeCon
|
||||
- 2025-04-02: Second day of KubeCon/CloudNativeCon
|
||||
- 2025-04-03: Added video links for Cloud Native Rejekts
|
||||
- 2025-04-03: Third day of KubeCon/CloudNativeCon
|
||||
- 2025-04-04: Fourth day of KubeCon/CloudNativeCon
|
||||
- 2025-04-07: Added missing images and slide links for KubeCon/CloudNativeCon
|
||||
|
||||
## Style Guide
|
||||
|
||||
The basic structure is as follows: `day/event-or-session`.
|
||||
58
content/day-1/01_container-security.md
Normal file
@@ -0,0 +1,58 @@
|
||||
---
|
||||
title: What I wish i knew about container security
|
||||
weight: 1
|
||||
tags:
|
||||
- rejekts
|
||||
- security
|
||||
---
|
||||
|
||||
{{% button href="https://www.youtube.com/watch?v=JAy6Ra0ulSw" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## BAseline
|
||||
|
||||
- Linux is like a hammer and containers look a lot like nails
|
||||
- Containers aren't real: They are just processes with besser isolation
|
||||
- IPTables is complicates
|
||||
|
||||
### Hard parts
|
||||
|
||||
- The kernel is shared we only predent to seperate processes through namespaces
|
||||
- Filesystems: Containers bring a bunch of filesystems and sharing filesystems between multiple containers
|
||||
- Softlinks are hard to do right because they point to a path and not the data itself
|
||||
|
||||
### How did we get here?
|
||||
|
||||
1. Unix with a buch of tools we still use
|
||||
2. Linux (originally designed to for the desktop)
|
||||
3. Kernel gets iptables
|
||||
4. The rist concept of namespaces
|
||||
5. More hypervisor stuff and official user namespaces
|
||||
6. Containers (first lxc then docker)
|
||||
|
||||
## Sandboxing
|
||||
|
||||
- In browsers: They must protect the user from malicious content
|
||||
- In containers: PRetty much the same - both run untrusted code that has to be isolated
|
||||
|
||||
## Namespaces
|
||||
|
||||
- Better isolation from other processes including resource constraints
|
||||
- But: The shared kernel interacts with all processes (so kernel bugs can affect all namespaces)
|
||||
|
||||

|
||||
|
||||
|
||||
## Improvements
|
||||
|
||||
- Secure Computing: Implement a secure state that we transition into before the process actually does stuff
|
||||
- Paravirtualization: Instead of system calls to a shared kernel we make hyper-calls to the hypervisor
|
||||
- Virtualization: The classic virtualization where everyone hosts their own kernel
|
||||
|
||||
## Stuff to look out for
|
||||
|
||||
> More or less a bit of advertisement
|
||||
|
||||
- Edera: Container native hypervisor without a shared kernel
|
||||
- Styrolite: Rust-based container runtime sandbox
|
||||
- eBPF and Tetragon for prevention and monitoring
|
||||
30
content/day-1/02_controllers.md
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
title: "The Hidden Brains of Kubernetes: Meet Controllers Powering the Cloud"
|
||||
weight: 2
|
||||
tags:
|
||||
- rejekts
|
||||
- operator
|
||||
---
|
||||
|
||||
{{% button href="https://www.youtube.com/watch?v=PciVvE02L2w" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Big Picture
|
||||
|
||||
- Kubernetes is just a bunch of controllers
|
||||
- We can add custom controllers
|
||||
|
||||
TODO: Steal Pod Controller sample
|
||||
|
||||
## Real World Power of controllers
|
||||
|
||||
- In Kubernetes: CCM, Scheduler, CM
|
||||
- Operator = CRD + Controller
|
||||
|
||||
TODO: Steal images from slides
|
||||
|
||||
## Example
|
||||
|
||||
> Crossplane as the example of the basic reconcile idea
|
||||
|
||||
TODO: Steal images from slides
|
||||
12
content/day-1/02_gslb.md
Normal file
@@ -0,0 +1,12 @@
|
||||
---
|
||||
title: Evaluating Global Load Balancing Options for Kubernetes in Practice
|
||||
weight: 2
|
||||
tags:
|
||||
- rejekts
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/RBMRU8rtxfI" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://github.com/nicolaiort/rejekts2025-gslb" style="tip" icon="code" %}}Demo-Code and more{{% /button %}}
|
||||
{{% button href="https://de.slideshare.net/slideshow/evaluating-global-load-balancing-options-for-kubernetes-in-practice-kubermatic-datev/277640385" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
My talk, notes will be released soon
|
||||
112
content/day-1/03_service-mesh.md
Normal file
@@ -0,0 +1,112 @@
|
||||
---
|
||||
title: The service mesh wars - a new hope for kubernetes
|
||||
weight: 3
|
||||
tags:
|
||||
- rejekts
|
||||
---
|
||||
|
||||
{{% button href="https://www.youtube.com/watch?v=DdQzGsiounY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## The clans (popular solutions)
|
||||
|
||||
- Kuma
|
||||
- Linkerd
|
||||
- Cilium
|
||||
- Istio
|
||||
- Ambient Mesh
|
||||
|
||||
## The new hope: Gateway API
|
||||
|
||||
- Will integrate itself into the networking solution (nginx, istio, kong)
|
||||
- CRDs for Ingress, LB, Servicemesh
|
||||
- CRDs linke: Gateway, HttpRoute, GrpcRoute, TCPRoute
|
||||
|
||||
## Expectations
|
||||
|
||||
- Baseline: Control Plane and Data Plane (Application + Proxy)
|
||||
- What we get: Rules, Logs, ...
|
||||
- Proxy-Variants:
|
||||
- Sidecar: Extra Pod, Service needs to be restarted for settings changes
|
||||
- Sidecarless: One proxy per node
|
||||
- Features: Ingress, egress, Mutual TLS, Retry Logic, Traffic Splitting, Ratelimits, Obervability
|
||||
|
||||
## Comparison
|
||||
|
||||
### Sidecar
|
||||
|
||||
TODO: Steal table from slides
|
||||
|
||||
| Kuma | Yes | Envoy
|
||||
|Linkerd | Yes | Linkerd Proxy
|
||||
|
||||
### Features
|
||||
|
||||
TODO: Steal Diagrams from slides
|
||||
|
||||
- Kuma: Gateway API Supported
|
||||
- CRD per Mesh with Ratelimiter, Timeouts, ....
|
||||
- To add to meh: Annotation
|
||||
- Linkerd: Gateway API Supported
|
||||
- Core Component: Server
|
||||
- To add to mesh: Annotate workload with proxy annotation
|
||||
- Cilium: Gateway API mostly Support
|
||||
- Utilizes eBPF for speed
|
||||
- Cann deploy envoy
|
||||
- CRDs for NEtworkPolicy
|
||||
- Istio: Gateway API Supported
|
||||
- CRDs with Services
|
||||
- To add: Annotate namespace or workload
|
||||
- Ambientmesh: Gateway API supported
|
||||
- Same Config as istio
|
||||
- Special: Layer 7 Rules require a waypoint
|
||||
- Missing: Several Policy features
|
||||
- To add: Annotate namespace and/or workload
|
||||
|
||||
TODO: Steal table from slides
|
||||
|
||||
### Observability
|
||||
|
||||
- Kuma: MEtrics by default with trace and log support (MeshTrace, MeshAccesslogs) via OpenTelemetry and it's own UI
|
||||
- Linkerd: Prometheus metrics, Viz extension for UI and Jaeger extension for traces (not OTel compliant)
|
||||
- Cilium: No Traces, only metrics and logs through hubble (with ui)
|
||||
- Istio/Ambient: Metrics, Traces and Logs with full OTel support on Dataplane and a external UI (Kali)
|
||||
|
||||
TODO: Steal table
|
||||
|
||||
### Performance
|
||||
|
||||
> Tests: https://github.com/isItObservable/servicemeshsecuritybenchmark
|
||||
|
||||
- KPIs: Ressources and Resource usage
|
||||
- Constant load, no policies:
|
||||
- Kuma 5,59ms
|
||||
- Linkerd: 2,55ms
|
||||
- Cilium 0ms
|
||||
- Istio: 6,43ms
|
||||
- Ambientmesh: 3,59ms
|
||||
- Loadtest no policies
|
||||
- Kuma: 7ms
|
||||
- Linkerd: 3,54ms
|
||||
- Cilium: 0,57ms
|
||||
- Istio: 8,8ms
|
||||
- Ambientmesh: 3,54ms
|
||||
|
||||
- Constant load policies
|
||||
- Kuma: 6,08
|
||||
- Linkerd: 2,55
|
||||
- Cilium: 0
|
||||
- Istio: 9,19
|
||||
- Ambientmesh: 3,69
|
||||
- Loadtest: TODO
|
||||
|
||||
TODO: Steal overview slide
|
||||
|
||||
## Recommendation
|
||||
|
||||
- If ambientmesh supports everything you need: It performs the best
|
||||
- Kuma includes everything you need when starting your first mesh
|
||||
- Linkerd: Complex configuration
|
||||
- Treat cilium as your cni and not nessecarely as your servicemesh
|
||||
|
||||
TODO: Steal conclusion slide
|
||||
53
content/day-1/04_dns-debugging.md
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
title: Understanding and Debugging DNS in Kubernetes Clusters
|
||||
weight: 4
|
||||
tags:
|
||||
- rejekts
|
||||
---
|
||||
|
||||
{{% button href="https://www.youtube.com/watch?v=awXjABDknww" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://github.com/mqasimsarfraz/talks/tree/main/CloudNativeRejekts-2025" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
|
||||
|
||||
## Baseline
|
||||
|
||||
### DNS Components
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
Application-->NodeLocalDNS-->CoreDNS-->Upstream
|
||||
```
|
||||
|
||||
### Problems
|
||||
|
||||
- Many hidden systems
|
||||
- Not easy to trace across clusters
|
||||
|
||||
## Tools
|
||||
|
||||
> Demo queries are located in the slides and were executed during the stream
|
||||
|
||||
### CoreDNS Log Plugin
|
||||
|
||||
- Core-Plugin (just needs to be activated)
|
||||
- Logs all requests to stdout
|
||||
|
||||
### Hubble
|
||||
|
||||
- Cilium observability needs cilium l7 proxy, runs as deamonset
|
||||
- Needs CiliumNetworkPolicies for AppPod and CoreDNS
|
||||
- Metrics, UI and cli with jq (and protocol filter)
|
||||
|
||||
### Inspector Gadget
|
||||
|
||||
- Toolset for Kubernetes and Linux that can be customized
|
||||
- Runns as daemonset or debug pod - gadgets are distributed as containers (via artifactorhub)
|
||||
- DNS-Gadget: Trace via ebpf, post process with wasm
|
||||
|
||||
|
||||
## Overview
|
||||
|
||||
- CoreDNS: Great for initial, nut only CoreDNS
|
||||
- Hubble: Compact overview, but cilium needed with special configs
|
||||
- Inspector Gadget: Rich DNS traces, limited tcp support
|
||||
59
content/day-1/05_edge.md
Normal file
@@ -0,0 +1,59 @@
|
||||
---
|
||||
title: "Kubernetes at the Far Edge: Harnessing IoT with Lightweight Clusters and Akri"
|
||||
weight: 5
|
||||
tags:
|
||||
- rejekts
|
||||
- edge
|
||||
---
|
||||
|
||||
{{% button href="https://www.youtube.com/watch?v=jywpFlOH3z0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## The far edge
|
||||
|
||||
- Resource constraint computing
|
||||
- Limited connectivity
|
||||
- More and smaller clusters
|
||||
|
||||
## Why kubernetes
|
||||
|
||||
- Automation, Scalability and resilience
|
||||
- Workload Portability through containers
|
||||
- Orchestration
|
||||
- Declarative state
|
||||
|
||||
## Enter k0s
|
||||
|
||||
- Minimal footprint as static binary
|
||||
- Simplified edge cluster management
|
||||
|
||||
## Managing disconnected edge nodes
|
||||
|
||||
- Needs: Remote managability
|
||||
- Idea: Centralized, remote Control Plane (that only does control plane)
|
||||
- Challenge: Network disconnections (kubernetes usually moves workload)
|
||||
|
||||
## Akri
|
||||
|
||||
> https://docs.akri.sh/
|
||||
|
||||
- Discovery of iot devices
|
||||
- Exposes IoT devices as k8s resources
|
||||
- Handels workload scheduling for leaf devices
|
||||
|
||||

|
||||
|
||||
|
||||
## Demo
|
||||
|
||||
Can be found in the video
|
||||
|
||||
## Q&A
|
||||
|
||||
- What about image distribution: Depends on networking conditions, k0s supports interna. images delivered as tar.gz
|
||||
- What can the broker do: Anything that a pod can interact with
|
||||
- What about reboots: Well akri had some problems in the demo, kubelet seems to start the containers again
|
||||
|
||||
## Random Notes
|
||||
|
||||
- Akri Kinda reminded me of the gpu-operator with extra resouce capacity for attached devices
|
||||
59
content/day-1/06_scaling-pdbs.md
Normal file
@@ -0,0 +1,59 @@
|
||||
---
|
||||
title: "Scaling PDBs: Introducing Multi-Cluster Resilience with x-pdb"
|
||||
weight: 6
|
||||
tags:
|
||||
- rejekts
|
||||
- multicluster
|
||||
---
|
||||
|
||||
{{% button href="https://www.youtube.com/watch?v=w8rDxtrMGG8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Baseline Infra
|
||||
|
||||
- Multiple Clusters across cloud providers
|
||||
- Cilium with Clustermesh
|
||||
- Stretched CockroachDB and NATS
|
||||
|
||||
TODO: Steal overview from slides
|
||||
|
||||
## PDBs and limits
|
||||
|
||||
- PDB: Classic core component that requires a number of pods with successfull readyness probes per deployment
|
||||
- Eviction: Can be stopped by a PDB what has not reached the minimum available
|
||||
- Interruptions: Voluntary (New image, updated specs, ...) vs involuntary (Eviction, deletion, node pressule, NoExecute, Node deletion)
|
||||
|
||||
## Stateful across multiple clusters
|
||||
|
||||
- Baseline: PDBs only know about one cluster
|
||||
- Problem: If the master pod fails (or get's evicted) on 2/3 clusters
|
||||
- Factors: Movement, Maintainance, Chaos-Experiments, Secret rotation
|
||||
- Workaround: Just manually check all systems before doing anything
|
||||
- Idea: Multi-Cluster PDB
|
||||
- Solution: A new hook on the eviciton api that interacts with a new Cluster-Aware CRD
|
||||
|
||||
## How it actually works
|
||||
|
||||
1. Drain API get's called
|
||||
2. Check replicas accross cluster
|
||||
3. Anwer based on current state
|
||||
|
||||
Actually: There is a lease-mechanism to prevent race conditions across clusters
|
||||
|
||||
TODO: Steal diagram from slides
|
||||
|
||||
## What works
|
||||
|
||||
- Voluntary: 100% supported
|
||||
- Involuntary: Yes they hooked into most of the deletion api calls (eviction, pressure, kubectl delete, admissions, node deletion)
|
||||
|
||||
## Demo
|
||||
|
||||
Pretty interesting, watch the video to find out
|
||||
|
||||
|
||||
## Q&A
|
||||
|
||||
- Do you need a flat network: No just expose the tcp lb
|
||||
- Did you think about using etcd to implement the leases instead of objects: They use managed hostplanes and dont want another etcd
|
||||
- Have you tried to commit upstream: Nope, pretty much not an option thanks to the managed control-plane not being able to set apropriate flags
|
||||
406
content/day-1/_imgs/akri-architecture.svg
Normal file
@@ -0,0 +1,406 @@
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
|
||||
<!-- Generated by Microsoft Visio, SVG Export akri-architecture.svg Page-7 -->
|
||||
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ev="http://www.w3.org/2001/xml-events"
|
||||
width="10.7038in" height="6.24177in" viewBox="0 0 770.672 449.407" xml:space="preserve" color-interpolation-filters="sRGB"
|
||||
class="st32">
|
||||
<style type="text/css">
|
||||
<![CDATA[
|
||||
.st1 {fill:#8ac4ff;stroke:#444a6d;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
|
||||
.st2 {fill:#000000;font-family:Calibri;font-size:1.5em}
|
||||
.st3 {fill:#ebedf2;stroke:#474b64;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
|
||||
.st4 {fill:#000000;font-family:Calibri;font-size:1.33333em}
|
||||
.st5 {fill:#524886;stroke:#474b64;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
|
||||
.st6 {fill:#feffff;font-family:Calibri;font-size:1.5em}
|
||||
.st7 {font-size:1em}
|
||||
.st8 {fill:#0aaba9;stroke:#444a6d;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
|
||||
.st9 {fill:#524886;stroke:#444a6d;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
|
||||
.st10 {stroke:none;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
|
||||
.st11 {fill:#ffffff;font-family:Calibri;font-size:1.49785em}
|
||||
.st12 {fill:#444a6d;stroke:none;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
|
||||
.st13 {stroke:#413a44;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.997017}
|
||||
.st14 {stroke:#413a44;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.00451}
|
||||
.st15 {fill:#0aaba9;stroke:#474b64;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5}
|
||||
.st16 {fill:#41455d;stroke:none;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
|
||||
.st17 {fill:#feffff;font-family:Calibri;font-size:1.00001em}
|
||||
.st18 {fill:none;stroke:none;stroke-width:0.25}
|
||||
.st19 {fill:#ebedf2;stroke:#41455d;stroke-width:1.5}
|
||||
.st20 {fill:#444a6d;font-family:Calibri;font-size:1.00001em}
|
||||
.st21 {fill:#2b74ef;font-size:1em}
|
||||
.st22 {fill:#ebedf2;font-size:1em}
|
||||
.st23 {fill:#524886;stroke:#444a6d;stroke-width:1.5}
|
||||
.st24 {fill:#41455d}
|
||||
.st25 {stroke:#474b64;stroke-width:0.25}
|
||||
.st26 {fill:#41455d;stroke:#41455d;stroke-width:0.25}
|
||||
.st27 {fill:#0aaba9;stroke:#444a6d;stroke-width:1.5}
|
||||
.st28 {fill:#444a6d;font-family:Calibri;font-size:1.66667em}
|
||||
.st29 {fill:#444a6d;font-family:Calibri;font-size:1.33333em}
|
||||
.st30 {fill:#2b74ef;stroke:#444a6d;stroke-width:1.5}
|
||||
.st31 {fill:#ffffff;font-family:Calibri;font-size:1.5em}
|
||||
.st32 {fill:none;fill-rule:evenodd;font-size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3}
|
||||
]]>
|
||||
</style>
|
||||
|
||||
<g>
|
||||
<title>Page-7</title>
|
||||
<g id="shape1000-1" transform="translate(18.75,-37.0005)">
|
||||
<title>Sheet.1000</title>
|
||||
<desc>Edge Cluster</desc>
|
||||
<path d="M0 63.68 L0 449.41 L484.04 449.41 L484.04 63.68 L0 63.68 L0 63.68 Z" class="st1"/>
|
||||
<text x="4" y="83.88" class="st2">Edge Cluster</text> </g>
|
||||
<g id="shape1001-4" transform="translate(33.0966,-184.359)">
|
||||
<title>Sheet.1001</title>
|
||||
<desc>Control Plane</desc>
|
||||
<path d="M0 248.81 L0 449.41 L456.2 449.41 L456.2 248.81 L0 248.81 L0 248.81 Z" class="st3"/>
|
||||
<text x="4" y="267.22" class="st4">Control Plane</text> </g>
|
||||
<g id="shape1002-7" transform="translate(56.3874,-300.225)">
|
||||
<title>Sheet.1002</title>
|
||||
<desc>Kubernetes Scheduler</desc>
|
||||
<path d="M0 401.68 C0 396.41 4.28 392.13 9.57 392.13 L118.54 392.13 C123.82 392.13 128.1 396.41 128.1 401.68 L128.1 439.87
|
||||
C128.1 445.14 123.82 449.41 118.54 449.41 L9.57 449.41 C4.28 449.41 0 445.14 0 439.87 L0 401.68 Z"
|
||||
class="st5"/>
|
||||
<text x="22.08" y="415.37" class="st6">Kubernetes <tspan x="27.76" dy="1.2em" class="st7">Scheduler</tspan></text> </g>
|
||||
<g id="shape1003-11" transform="translate(207.778,-300.225)">
|
||||
<title>Sheet.1003</title>
|
||||
<desc>Akri Controller</desc>
|
||||
<path d="M0 401.69 C0 396.41 4.29 392.13 9.57 392.13 L118.54 392.13 C123.82 392.13 128.1 396.41 128.1 401.69 L128.1 439.87
|
||||
C128.1 445.14 123.82 449.41 118.54 449.41 L9.57 449.41 C4.29 449.41 0 445.14 0 439.87 L0 401.69 Z"
|
||||
class="st8"/>
|
||||
<text x="49.55" y="415.37" class="st6">Akri <tspan x="27.13" dy="1.2em" class="st7">Controller</tspan></text> </g>
|
||||
<g id="shape1004-15" transform="translate(52.7937,-213.237)">
|
||||
<title>Sheet.1004</title>
|
||||
<path d="M0 401.69 C-0 396.41 4.29 392.13 9.57 392.13 L269.93 392.13 C275.21 392.13 279.49 396.41 279.49 401.69 L279.49
|
||||
439.87 C279.49 445.14 275.21 449.41 269.93 449.41 L9.57 449.41 C4.29 449.41 0 445.14 0 439.87 L0 401.69
|
||||
Z" class="st9"/>
|
||||
</g>
|
||||
<g id="shape1005-17" transform="translate(153.653,-228.513)">
|
||||
<title>Sheet.1005</title>
|
||||
<desc>API Server</desc>
|
||||
<path d="M91.01 427.83 L0 427.83 L0 449.41 L91.01 449.41 L91.01 427.83" class="st10"/>
|
||||
<text x="7.97" y="444.01" class="st11">API Server</text> </g>
|
||||
<g id="shape1006-21" transform="translate(111.013,-270.515)">
|
||||
<title>Sheet.1006</title>
|
||||
<path d="M0 430.53 L9.42 421.13 L18.85 430.53 L14.14 430.53 L14.14 440 L18.85 440 L9.42 449.41 L0 440 L4.71 440 L4.71
|
||||
430.53 L0 430.53 Z" class="st12"/>
|
||||
</g>
|
||||
<g id="shape1007-23" transform="translate(111.013,-270.515)">
|
||||
<title>Sheet.1007</title>
|
||||
<path d="M0 430.53 L9.42 421.13 L18.85 430.53 L14.14 430.53 L14.14 440 L18.85 440 L9.42 449.41 L0 440 L4.71 440 L4.71
|
||||
430.53 L0 430.53" class="st13"/>
|
||||
</g>
|
||||
<g id="shape1008-26" transform="translate(262.403,-270.515)">
|
||||
<title>Sheet.1008</title>
|
||||
<path d="M0 430.53 L9.42 421.13 L18.85 430.53 L14.14 430.53 L14.14 440 L18.85 440 L9.42 449.41 L0 440 L4.71 440 L4.71
|
||||
430.53 L0 430.53 Z" class="st12"/>
|
||||
</g>
|
||||
<g id="shape1009-28" transform="translate(262.403,-270.515)">
|
||||
<title>Sheet.1009</title>
|
||||
<path d="M0 430.53 L9.42 421.13 L18.85 430.53 L14.14 430.53 L14.14 440 L18.85 440 L9.42 449.41 L0 440 L4.71 440 L4.71
|
||||
430.53 L0 430.53" class="st14"/>
|
||||
</g>
|
||||
<g id="shape1010-31" transform="translate(33.0966,-51.7091)">
|
||||
<title>Sheet.1010</title>
|
||||
<desc>Node</desc>
|
||||
<path d="M0 344.8 L0 449.41 L456.2 449.41 L456.2 344.8 L0 344.8 L0 344.8 Z" class="st3"/>
|
||||
<text x="4" y="363.2" class="st4">Node</text> </g>
|
||||
<g id="shape1011-34" transform="translate(56.3874,-70.5221)">
|
||||
<title>Sheet.1011</title>
|
||||
<path d="M0 401.69 C-0 396.41 4.29 392.13 9.57 392.13 L118.54 392.13 C123.82 392.13 128.1 396.41 128.1 401.69 L128.1
|
||||
439.87 C128.1 445.14 123.82 449.41 118.54 449.41 L9.57 449.41 C4.29 449.41 0 445.14 0 439.87 L0 401.69 Z"
|
||||
class="st5"/>
|
||||
</g>
|
||||
<g id="shape1012-36" transform="translate(82.4207,-86.7911)">
|
||||
<title>Sheet.1012</title>
|
||||
<desc>Kubelet</desc>
|
||||
<path d="M69.37 427.83 L0 427.83 L0 449.41 L69.37 449.41 L69.37 427.83" class="st10"/>
|
||||
<text x="6.56" y="444.01" class="st11">Kubelet</text> </g>
|
||||
<g id="shape1013-40" transform="translate(198,-70.5221)">
|
||||
<title>Sheet.1013</title>
|
||||
<desc>Akri Agent</desc>
|
||||
<path d="M0 401.69 C0 396.41 2.23 392.13 4.98 392.13 L61.64 392.13 C64.38 392.13 66.61 396.41 66.61 401.69 L66.61 439.87
|
||||
C66.61 445.14 64.38 449.41 61.64 449.41 L4.98 449.41 C2.23 449.41 0 445.14 0 439.87 L0 401.69 Z"
|
||||
class="st15"/>
|
||||
<text x="18.82" y="415.38" class="st11">Akri <tspan x="11.67" dy="1.2em" class="st7">Agent</tspan></text> </g>
|
||||
<g id="shape1016-44" transform="translate(466.794,-94.2275)">
|
||||
<title>Sheet.1016</title>
|
||||
<desc><protocol></desc>
|
||||
<path d="M0 429.16 L23.82 408.91 L23.82 419.03 L71.46 419.03 L71.46 408.91 L95.28 429.16 L71.46 449.41 L71.46 439.28
|
||||
L23.82 439.28 L23.82 449.41 L0 429.16 Z" class="st16"/>
|
||||
<text x="21" y="432.76" class="st17"><protocol></text> </g>
|
||||
<g id="shape1017-47" transform="translate(111.013,-128.878)">
|
||||
<title>Sheet.1017</title>
|
||||
<path d="M0 374.99 L9 366.01 L18.01 374.99 L13.51 374.99 L13.51 440.42 L18.01 440.42 L9 449.41 L0 440.42 L4.5 440.42
|
||||
L4.5 374.99 L0 374.99 Z" class="st12"/>
|
||||
</g>
|
||||
<g id="shape1018-49" transform="translate(111.013,-128.878)">
|
||||
<title>Sheet.1018</title>
|
||||
<path d="M0 374.99 L9 366.01 L18.01 374.99 L13.51 374.99 L13.51 440.42 L18.01 440.42 L9 449.41 L0 440.42 L4.5 440.42
|
||||
L4.5 374.99 L0 374.99" class="st14"/>
|
||||
</g>
|
||||
<g id="shape1019-52" transform="translate(220.436,-129.118)">
|
||||
<title>Sheet.1019</title>
|
||||
<path d="M0 375.05 L9.06 366.01 L18.13 375.05 L13.6 375.05 L13.6 440.36 L18.13 440.36 L9.06 449.41 L0 440.36 L4.53 440.36
|
||||
L4.53 375.05 L0 375.05 Z" class="st16"/>
|
||||
</g>
|
||||
<g id="shape1-54" transform="translate(579.294,-344.358)">
|
||||
<title>Sheet.1</title>
|
||||
<rect x="0" y="382.668" width="63" height="66.7397" class="st18"/>
|
||||
<!-- Unsupported Record: EmfPlusRecordTypeSetPixelOffsetMode -->
|
||||
<svg viewBox="-0.55922 -0.55862 68.039 72" height="66.7397" preserveAspectRatio="none" width="63" x="0" y="382.668">
|
||||
<clipPath id="mfid1">
|
||||
<rect x="-0.55922" y="-0.55862" width="68.039" height="72" id="mfid2"/>
|
||||
</clipPath>
|
||||
<g clip-path="url(#mfid1)">
|
||||
<mask id="mfid3">
|
||||
<rect width="68" height="72" fill="white" stroke="none"/>
|
||||
</mask>
|
||||
<mask id="mfid4" fill="white" stroke="none">
|
||||
<g>
|
||||
<g mask="url(#mfid3)">
|
||||
<use xlink:href="#mfid2"/>
|
||||
</g>
|
||||
</g>
|
||||
</mask>
|
||||
<defs>
|
||||
<image id="mfid5" width="68" height="72" xlink:href=""/>
|
||||
</defs>
|
||||
<!-- Unsupported Record: EmfPlusRecordTypeSetObject Obj_ImageAttributes -->
|
||||
<g mask="url(#mfid4)">
|
||||
<g transform="matrix(0.00015748, 0, 0, 0.00015748, 0, 0)">
|
||||
<clipPath id="mfid6">
|
||||
<rect x="-0.5" y="-0.5" width="68" height="72"/>
|
||||
</clipPath>
|
||||
<use xlink:href="#mfid5" clip-path="url(#mfid6)" transform="matrix(6350, 0, 0, 6350, 3175, 3175)"/>
|
||||
</g>
|
||||
</g>
|
||||
</g>
|
||||
</svg>
|
||||
<rect x="0" y="382.668" width="63" height="66.7397" class="st18"/>
|
||||
</g>
|
||||
<g id="shape1020-57" transform="translate(518.023,-187.512)">
|
||||
<title>Sheet.1020</title>
|
||||
<desc>kind: Configuration metadata: ..name: akri-<protocol> spec: ....</desc>
|
||||
<rect x="0" y="302.977" width="203.541" height="146.43" class="st19"/>
|
||||
<text x="4" y="314.99" class="st20">kind: <tspan class="st21">Configuration </tspan><tspan x="4" dy="1.2em" class="st7">metadata: </tspan><tspan
|
||||
x="4" dy="1.2em" class="st22">..</tspan>name: akri<tspan class="st21">-</tspan><tspan class="st21"><protocol> </tspan><tspan
|
||||
x="4" dy="1.2em" class="st7">spec: </tspan><tspan x="4" dy="1.2em" class="st22">..</tspan>discoveryHandler: <tspan
|
||||
x="4" dy="1.2em" class="st22">…..</tspan>name: <protocol> <tspan x="4" dy="1.2em" class="st22">..</tspan>brokerPodSpec: <tspan
|
||||
x="4" dy="1.2em" class="st22">…..</tspan>containers: <tspan x="4" dy="1.2em" class="st22">…..</tspan>- name: <tspan
|
||||
class="st21">custom</tspan><tspan class="st21">-</tspan><tspan class="st21">broker </tspan><tspan x="4"
|
||||
dy="1.2em" class="st22">……..</tspan>image: "<tspan class="st21">ghcr.io/</tspan><tspan class="st21">…"</tspan></text> </g>
|
||||
<g id="shape1021-77" transform="translate(782.153,197.043) rotate(90)">
|
||||
<title>Sheet.1021</title>
|
||||
<path d="M0 426.65 L9.42 415.31 L18.85 426.65 L14.14 426.65 L14.14 438.07 L18.85 438.07 L9.42 449.41 L0 438.07 L4.71
|
||||
438.07 L4.71 426.65 L0 426.65 Z" class="st16"/>
|
||||
</g>
|
||||
<g id="group1022-79" transform="translate(367.794,-193.864)">
|
||||
<title>Can.1091</title>
|
||||
<desc>etcd</desc>
|
||||
<g id="shape1023-80">
|
||||
<title>Sheet.1023</title>
|
||||
<path d="M0 435.91 A53.0646 13.5 -180 0 0 106.13 435.91 L106.13 282.91 L0 282.91 L0 435.91 Z" class="st23"/>
|
||||
</g>
|
||||
<g id="shape1022-82">
|
||||
<ellipse cx="53.0646" cy="282.907" rx="53.0646" ry="13.5" class="st23"/>
|
||||
<text x="37.04" y="289.61" class="st6">etcd</text> </g>
|
||||
</g>
|
||||
<g id="group1024-85" transform="translate(517.374,600.4) rotate(180)">
|
||||
<title>1-D single.1004</title>
|
||||
<g id="shape1025-86">
|
||||
<title>Sheet.1025</title>
|
||||
<path d="M-0.75 444.06 L48.8 444.06 L48.8 449.41 L54.14 438.72 L48.8 428.03 L48.8 433.38 L-0.75 433.38 L-0.75 444.06
|
||||
Z" class="st24"/>
|
||||
<path d="M-0.75 444.06 L48.8 444.06 L48.8 449.41 L54.14 438.72 L48.8 428.03 L48.8 433.38 L-0.75 433.38"
|
||||
class="st25"/>
|
||||
</g>
|
||||
<g id="shape1026-89">
|
||||
<title>Sheet.1026</title>
|
||||
<path d="M0 444.06 L48.8 444.06 L48.8 449.41 L54.14 438.72 L48.8 428.03 L48.8 433.38 L0 433.38" class="st25"/>
|
||||
</g>
|
||||
<g id="shape1027-92" transform="translate(-0.5,-5.59375)">
|
||||
<title>Sheet.1027</title>
|
||||
<rect x="0" y="439.22" width="0.5" height="10.1875" class="st26"/>
|
||||
</g>
|
||||
</g>
|
||||
<g id="group1031-94" transform="translate(382.5,-207)">
|
||||
<title>Sheet.1031</title>
|
||||
<g id="shape1032-95" transform="translate(0.306904,-68.056)">
|
||||
<title>Rectangle.1066</title>
|
||||
<desc>Configuration CRD</desc>
|
||||
<rect x="0" y="385.571" width="80.6931" height="63.8367" class="st27"/>
|
||||
<text x="6.98" y="400.37" class="st17">Configuration <tspan x="30.2" dy="1.2em" class="st7">CRD</tspan></text> </g>
|
||||
<g id="shape1033-99">
|
||||
<title>Rectangle.1067</title>
|
||||
<desc>Instance CRD</desc>
|
||||
<rect x="0" y="385.571" width="80.6931" height="63.8367" class="st27"/>
|
||||
<text x="19.78" y="400.37" class="st17">Instance <tspan x="30.2" dy="1.2em" class="st7">CRD</tspan></text> </g>
|
||||
<g id="shape1034-103" transform="translate(3.02536,-70.6819)">
|
||||
<title>Rectangle.1068</title>
|
||||
<desc><protocol> Configuration</desc>
|
||||
<rect x="0" y="421.128" width="75.2562" height="28.2795" class="st27"/>
|
||||
<text x="10.99" y="431.67" class="st17"><protocol> <tspan x="4.26" dy="1.2em" class="st7">Configuration</tspan></text> </g>
|
||||
<g id="shape1035-107" transform="translate(2.71845,-3.4226)">
|
||||
<title>Rectangle.1069</title>
|
||||
<desc><protocol> Instance</desc>
|
||||
<rect x="0" y="421.128" width="75.2562" height="28.2795" class="st27"/>
|
||||
<text x="10.99" y="431.67" class="st17"><protocol> <tspan x="17.06" dy="1.2em" class="st7">Instance</tspan></text> </g>
|
||||
</g>
|
||||
<g id="shape1036-111" transform="translate(582.879,-77.8757)">
|
||||
<title>Sheet.1036</title>
|
||||
<desc>Leaf Device</desc>
|
||||
<path d="M0 362.79 L0 449.41 L87.88 449.41 L87.88 362.79 L0 362.79 L0 362.79 Z" class="st3"/>
|
||||
<text x="26.92" y="400.1" class="st28">Leaf <tspan x="16.8" dy="1.2em" class="st7">Device</tspan></text> </g>
|
||||
<g id="shape1037-115" transform="translate(574.939,-67.186)">
|
||||
<title>Sheet.1037</title>
|
||||
<desc>Leaf Device</desc>
|
||||
<path d="M0 362.79 L0 449.41 L87.88 449.41 L87.88 362.79 L0 362.79 L0 362.79 Z" class="st3"/>
|
||||
<text x="26.92" y="400.1" class="st28">Leaf <tspan x="16.8" dy="1.2em" class="st7">Device</tspan></text> </g>
|
||||
<g id="shape1038-119" transform="translate(567,-58.186)">
|
||||
<title>Sheet.1038</title>
|
||||
<desc>Leaf Device</desc>
|
||||
<path d="M0 362.79 L0 449.41 L87.88 449.41 L87.88 362.79 L0 362.79 L0 362.79 Z" class="st3"/>
|
||||
<text x="6.8" y="440.61" class="st29">Leaf Device</text> </g>
|
||||
<g id="shape2-122" transform="translate(569.909,-102.2)">
|
||||
<title>Sheet.2</title>
|
||||
<rect x="0" y="406.801" width="43.9393" height="42.6065" class="st18"/>
|
||||
<!-- Unsupported Record: EmfPlusRecordTypeSetPixelOffsetMode -->
|
||||
<svg viewBox="-0.55922 -0.55862 129.06 125.01" height="42.6065" preserveAspectRatio="none" width="43.9393" x="0"
|
||||
y="406.801">
|
||||
<clipPath id="mfid7">
|
||||
<rect x="-0.55922" y="-0.55862" width="129.06" height="125.01" id="mfid8"/>
|
||||
</clipPath>
|
||||
<g clip-path="url(#mfid7)">
|
||||
<mask id="mfid9">
|
||||
<rect width="129" height="125" fill="white" stroke="none"/>
|
||||
</mask>
|
||||
<mask id="mfid10" fill="white" stroke="none">
|
||||
<g>
|
||||
<g mask="url(#mfid9)">
|
||||
<use xlink:href="#mfid8"/>
|
||||
</g>
|
||||
</g>
|
||||
</mask>
|
||||
<defs>
|
||||
<image id="mfid11" width="129" height="125" xlink:href=""/>
|
||||
</defs>
|
||||
<!-- Unsupported Record: EmfPlusRecordTypeSetObject Obj_ImageAttributes -->
|
||||
<g mask="url(#mfid10)">
|
||||
<g transform="matrix(0.00015748, 0, 0, 0.00015748, 0, 0)">
|
||||
<clipPath id="mfid12">
|
||||
<rect x="-0.5" y="-0.5" width="129" height="125"/>
|
||||
</clipPath>
|
||||
<use xlink:href="#mfid11" clip-path="url(#mfid12)" transform="matrix(6350, 0, 0, 6350, 3175, 3175)"/>
|
||||
</g>
|
||||
</g>
|
||||
</g>
|
||||
</svg>
|
||||
<rect x="0" y="406.801" width="43.9393" height="42.6065" class="st18"/>
|
||||
</g>
|
||||
<g id="shape3-125" transform="translate(616.283,-104.121)">
|
||||
<title>Sheet.3</title>
|
||||
<rect x="0" y="415.276" width="34.0958" height="34.1311" class="st18"/>
|
||||
<!-- Unsupported Record: EmfPlusRecordTypeSetPixelOffsetMode -->
|
||||
<svg viewBox="-0.55922 -0.55862 120.05 120.04" height="34.1311" preserveAspectRatio="none" width="34.0958" x="0"
|
||||
y="415.276">
|
||||
<clipPath id="mfid13">
|
||||
<rect x="-0.55922" y="-0.55862" width="120.05" height="120.04" id="mfid14"/>
|
||||
</clipPath>
|
||||
<g clip-path="url(#mfid13)">
|
||||
<mask id="mfid15">
|
||||
<rect width="120" height="120" fill="white" stroke="none"/>
|
||||
</mask>
|
||||
<mask id="mfid16" fill="white" stroke="none">
|
||||
<g>
|
||||
<g mask="url(#mfid15)">
|
||||
<use xlink:href="#mfid14"/>
|
||||
</g>
|
||||
</g>
|
||||
</mask>
|
||||
<defs>
|
||||
<image id="mfid17" width="120" height="120" xlink:href=""/>
|
||||
</defs>
|
||||
<!-- Unsupported Record: EmfPlusRecordTypeSetObject Obj_ImageAttributes -->
|
||||
<g mask="url(#mfid16)">
|
||||
<g transform="matrix(0.00015748, 0, 0, 0.00015748, 0, 0)">
|
||||
<clipPath id="mfid18">
|
||||
<rect x="-0.5" y="-0.5" width="120" height="120"/>
|
||||
</clipPath>
|
||||
<use xlink:href="#mfid17" clip-path="url(#mfid18)" transform="matrix(6350, 0, 0, 6350, 3175, 3175)"/>
|
||||
</g>
|
||||
</g>
|
||||
</g>
|
||||
</svg>
|
||||
<rect x="0" y="415.276" width="34.0958" height="34.1311" class="st18"/>
|
||||
</g>
|
||||
<g id="shape4-128" transform="translate(593.891,-79.7301)">
|
||||
<title>Sheet.4</title>
|
||||
<rect x="0" y="415.274" width="34.0958" height="34.1336" class="st18"/>
|
||||
<!-- Unsupported Record: EmfPlusRecordTypeSetPixelOffsetMode -->
|
||||
<svg viewBox="-0.55922 -0.55862 112.03 112.03" height="34.1336" preserveAspectRatio="none" width="34.0958" x="0"
|
||||
y="415.274">
|
||||
<clipPath id="mfid19">
|
||||
<rect x="-0.55922" y="-0.55862" width="112.03" height="112.03" id="mfid20"/>
|
||||
</clipPath>
|
||||
<g clip-path="url(#mfid19)">
|
||||
<mask id="mfid21">
|
||||
<rect width="112" height="112" fill="white" stroke="none"/>
|
||||
</mask>
|
||||
<mask id="mfid22" fill="white" stroke="none">
|
||||
<g>
|
||||
<g mask="url(#mfid21)">
|
||||
<use xlink:href="#mfid20"/>
|
||||
</g>
|
||||
</g>
|
||||
</mask>
|
||||
<defs>
|
||||
<image id="mfid23" width="112" height="112" xlink:href=""/>
|
||||
</defs>
|
||||
<!-- Unsupported Record: EmfPlusRecordTypeSetObject Obj_ImageAttributes -->
|
||||
<g mask="url(#mfid22)">
|
||||
<g transform="matrix(0.00015748, 0, 0, 0.00015748, 0, 0)">
|
||||
<clipPath id="mfid24">
|
||||
<rect x="-0.5" y="-0.5" width="112" height="112"/>
|
||||
</clipPath>
|
||||
<use xlink:href="#mfid23" clip-path="url(#mfid24)" transform="matrix(6350, 0, 0, 6350, 3175, 3175)"/>
|
||||
</g>
|
||||
</g>
|
||||
</g>
|
||||
</svg>
|
||||
<rect x="0" y="415.274" width="34.0958" height="34.1336" class="st18"/>
|
||||
</g>
|
||||
<g id="group1039-131" transform="translate(366.448,-68.8351)">
|
||||
<title>Sheet.1039</title>
|
||||
<g id="shape1028-132" transform="translate(11.7657,-10.9193)">
|
||||
<title>Wavy Box.1020</title>
|
||||
<desc>Broker</desc>
|
||||
<path d="M83.04 436.74 L83.04 394.16 L0 394.16 L0 445.26 C31.94 453.78 41.15 447.66 51.76 442.93 C59.57 439.46 68.13
|
||||
436.74 83.04 436.74 Z" class="st30"/>
|
||||
<text x="17.03" y="427.18" class="st31">Broker</text> </g>
|
||||
<g id="shape1029-135" transform="translate(6.41885,-5.69989)">
|
||||
<title>Wavy Box.1019</title>
|
||||
<desc>Broker</desc>
|
||||
<path d="M83.04 436.74 L83.04 394.16 L0 394.16 L0 445.26 C31.94 453.78 41.15 447.66 51.76 442.93 C59.57 439.46 68.13
|
||||
436.74 83.04 436.74 Z" class="st30"/>
|
||||
<text x="17.03" y="427.18" class="st31">Broker</text> </g>
|
||||
<g id="shape1030-138">
|
||||
<title>Wavy Box.1003</title>
|
||||
<desc>custom-broker</desc>
|
||||
<path d="M83.04 436.74 L83.04 394.16 L0 394.16 L0 445.26 C31.94 453.78 41.15 447.66 51.76 442.93 C59.57 439.46 68.13
|
||||
436.74 83.04 436.74 Z" class="st30"/>
|
||||
<text x="11.76" y="414.36" class="st31">custom-<tspan x="17.2" dy="1.2em" class="st7">broker</tspan></text> </g>
|
||||
</g>
|
||||
<g id="shape1040-142" transform="translate(288,-68.8351)">
|
||||
<title>Sheet.1040</title>
|
||||
<desc><protocol> Discovery Handler</desc>
|
||||
<path d="M0 401.69 C0 396.41 2.23 392.13 4.98 392.13 L61.64 392.13 C64.38 392.13 66.61 396.41 66.61 401.69 L66.61 439.87
|
||||
C66.61 445.14 64.38 449.41 61.64 449.41 L4.98 449.41 C2.23 449.41 0 445.14 0 439.87 L0 401.69 Z"
|
||||
class="st15"/>
|
||||
<text x="6.67" y="409.97" class="st17"><protocol> <tspan x="9.68" dy="1.2em" class="st7">Discovery </tspan><tspan
|
||||
x="13.93" dy="1.2em" class="st7">Handler</tspan></text> </g>
|
||||
<g id="shape1041-147" transform="translate(715.807,342.509) rotate(90)">
|
||||
<title>Sheet.1041</title>
|
||||
<path d="M0 436.83 L9.42 430.56 L18.85 436.83 L14.14 436.83 L14.14 443.14 L18.85 443.14 L9.42 449.41 L0 443.14 L4.71
|
||||
443.14 L4.71 436.83 L0 436.83 Z" class="st12"/>
|
||||
</g>
|
||||
</g>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 29 KiB |
BIN
content/day-1/_imgs/namespaces.png
Normal file
|
After Width: | Height: | Size: 181 KiB |
@@ -4,8 +4,19 @@ title: Day -1
|
||||
weight: 3
|
||||
---
|
||||
|
||||
TODO:
|
||||
The second and last day of cloud native rejekts and (some might say most importantly) time for my talk.
|
||||
This was another very interesting day and I can only recommend attending cloud native rejekts (and will always try to atend in the future if possible).
|
||||
|
||||
## Talk recommendations
|
||||
|
||||
* TODO:
|
||||
- My Talk: [Evaluating Global Load Balancing Options for Kubernetes in Practice](./02_gslb)
|
||||
- Service Mesh Intro + Comparison: [The service mesh wars - a new hope for kubernetes](./03_service-mesh)
|
||||
- How to handle evection and statefulness across clusters: [Scaling PDBs: Introducing Multi-Cluster Resilience with x-pdb](./06_scaling-pdbs)
|
||||
- Intro to operators: [The Hidden Brains of Kubernetes: Meet Controllers Powering the Cloud](./02_controllers)
|
||||
|
||||
## Other stuff I learned or people i talk to
|
||||
|
||||
- Take a deeper look into CoreDNS plugins
|
||||
- A bunch of nice people that heard my talk and had questions
|
||||
- Someone from Ampere that would like to help me to convince the infra team to get arm nodes
|
||||
- Look into NATS (at least a bit), everyone seems to like it but i never used it myself (only in some projects)
|
||||
@@ -7,5 +7,6 @@ tags:
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
Short opening keynote thanking volunteers and attendees.
|
||||
@@ -4,10 +4,12 @@ weight: 2
|
||||
tags:
|
||||
- rejekts
|
||||
- cluster
|
||||
- operatr
|
||||
- operator
|
||||
- multicluster
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
{{% button href="https://www.youtube.com/watch?v=r0W6cCJAGro" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
The talk started with a base introduction of ClusterAPI and the operations at gigantswarm.
|
||||
|
||||
|
||||
@@ -6,7 +6,8 @@ tags:
|
||||
- keynote
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
{{% button href="https://www.youtube.com/watch?v=m9NRk-6MSvY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
A short keynote from micrososft about their contributions to open source and used tools:
|
||||
- infra (kubernates, istio, hyperlight)
|
||||
|
||||
@@ -3,9 +3,11 @@ title: CRD Data Architecture for Multi-Cluster Kubernetes
|
||||
weight: 4
|
||||
tags:
|
||||
- rejekts
|
||||
- multicluster
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
{{% button href="https://www.youtube.com/watch?v=e1BmT0jc_Fs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Background
|
||||
|
||||
|
||||
@@ -5,7 +5,8 @@ tags:
|
||||
- rejekts
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
{{% button href="https://www.youtube.com/watch?v=CAPtQnH4rPY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Recruitment & Staffing
|
||||
|
||||
|
||||
@@ -5,7 +5,8 @@ tags:
|
||||
- rejekts
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
{{% button href="https://www.youtube.com/watch?v=qNShvqSTKCU" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Background: The state of cloud in mauritius
|
||||
|
||||
|
||||
@@ -6,7 +6,8 @@ tags:
|
||||
- performance
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
{{% button href="https://www.youtube.com/watch?v=EYipC5y-8rM" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
There were more details in the talk than I copied into these notes.
|
||||
Most of them were just too much to write down or application specific.
|
||||
|
||||
110
content/day-2/08_airgapped-cp.md
Normal file
@@ -0,0 +1,110 @@
|
||||
---
|
||||
title: Building air-gapped control planes for a global pharma leader using crossplane and argo
|
||||
weight: 8
|
||||
tags:
|
||||
- rejekts
|
||||
- crossplane
|
||||
---
|
||||
|
||||
{{% button href="https://www.youtube.com/watch?v=D4bKe4rAasc" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
Joint effort of novo-nordik and upbound.
|
||||
|
||||
## Background
|
||||
|
||||
- Ymir Platform: Foundational abstraction platform
|
||||
- Goal: Faster time to market
|
||||
- Usage in pharma: end-2-end compliance
|
||||
- Airgap: Use gitopt and prevent human interaction with the control planes
|
||||
|
||||
## Setup
|
||||
|
||||
- Decision for crossplane was obvious
|
||||
- Problem: Chicken and egg "we provision clusters via crossplane but crossplane needs a cluster"
|
||||
- GitOps: Everything as code with automatic tests and argo
|
||||
- Infra: Azure
|
||||
|
||||
### Public AKS
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph MC
|
||||
ProviderAzure
|
||||
ProviderKubernetes
|
||||
end
|
||||
ProviderAzure-->|Calls APU|AKS
|
||||
AKS-->|Provisions|Kubernetescluster
|
||||
ProviderKubernetes-->|Deploys service on|Kubernetescluster
|
||||
```
|
||||
|
||||
### Bastion Bootstrap
|
||||
|
||||
- Options: Terraform/Opentofu
|
||||
- Goal: Crossplane all the things
|
||||
- Solution: Run Crossplane in a github action
|
||||
1. Kind Cluster
|
||||
2. Install Crossplane
|
||||
3. Propagete Credentials
|
||||
4. Create Cluster
|
||||
- Tooling: Uptest - E2E Test automation Framework, can be used for bootstrapping since it creates kind cluster with crossplane
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph GitHubRunner
|
||||
Kubernetes
|
||||
Crossplane
|
||||
end
|
||||
subgraph Azure
|
||||
BastionVM
|
||||
end
|
||||
Crossplane-->|Create|BastionVM
|
||||
```
|
||||
|
||||
### Next steps
|
||||
|
||||
- Problem: How to access bastion
|
||||
- Solution: Auto-register bastion as github runner
|
||||
- Create Bastion-Cluster via Uptest
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph Azure
|
||||
subgraph BastionVM
|
||||
GitHubRunner
|
||||
Kubernetes
|
||||
Crossplane
|
||||
end
|
||||
subgraph BastionCluster
|
||||
Kubernetes
|
||||
Argo
|
||||
CrossPlane
|
||||
end
|
||||
end
|
||||
Crossplane-->|Create|BastionCluster
|
||||
```
|
||||
|
||||
TODO: Steal image from slides
|
||||
|
||||
## Challenges
|
||||
|
||||
- Argo sync waves:
|
||||
- Problem: Argo does not support eventual consistency
|
||||
- Example: Install a ProviderConfig before your Provider and sync fails without retry
|
||||
- Order stuff very carefully
|
||||
- Delivering updates to private clusters
|
||||
- Difference between public and private: It's the same package
|
||||
- Upgrades/Downgrades: Change the package (Crossplane) and cluster (CRD)
|
||||
- Testing:
|
||||
- Static: Multiple stages and each stage has it's own bootstrap env that can be set to any branch
|
||||
- Ephemeral: Uptest
|
||||
|
||||
TODO: Steal images from slides
|
||||
|
||||
|
||||
## Wrap-up
|
||||
|
||||
- Cloud native air-gapped ✅
|
||||
- GitOps ✅
|
||||
- Crossplane, no terraform ✅
|
||||
- Extensible, reusable, API-first ✅
|
||||
84
content/day-2/09_e2e-authenticity.md
Normal file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
title: End to End Message Authenticity in Cloud Native Systems
|
||||
weight: 9
|
||||
tags:
|
||||
- rejekts
|
||||
- security
|
||||
---
|
||||
|
||||
{{% button href="https://www.youtube.com/watch?v=rJacyDygVi0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Why does e2e authenticity matter?
|
||||
|
||||
- Classic Setup: Micro-Services with TLS and auth via Bearer
|
||||
```mermaid
|
||||
graph LR
|
||||
User-->|TLS|Gateway
|
||||
Gateway-->|mTLS|Server
|
||||
Server-->|mTLS|Gateway
|
||||
Gateway-->|TLS|User
|
||||
```
|
||||
- Intrusion: Hacked Gateway
|
||||
- Can modify the request
|
||||
- Could log auth tokens
|
||||
- Could replay requests with different body or token
|
||||
|
||||
|
||||
## Baseline OIDC
|
||||
- Only IDP has private key for signing
|
||||
- Anyone can fetch the private key and verify
|
||||
- Usage: SSO, Trust Federation
|
||||
- Problem: Symmetric Credential can be forwarded if leaked
|
||||
|
||||
## Fixes
|
||||
|
||||
### HTTP Message Signatures
|
||||
|
||||
- Idea:
|
||||
- Client can sign the content and headers with a symmstric/asynmetric key
|
||||
- Server can verify the signature
|
||||
- Implementation: Basicly just an additional Signature Header and a Header that tells us what is included in the signature
|
||||
```
|
||||
HTTPS POST /test
|
||||
Authorization: Bearer <token>
|
||||
Signature-Input: "authorization" @body
|
||||
Signature: ahsz7d9zahbsdoih
|
||||
```
|
||||
- Problem: Key distribution
|
||||
- Real-World: AWS v4 Signature shares accesskey and secretkey out of band and signs header with accesskey (symmatric)
|
||||
- Transitive Trust
|
||||
|
||||
### OIDC Key binding
|
||||
|
||||
TODO: Steal image from slides
|
||||
|
||||
### Proof of Posession
|
||||
|
||||
> Basicly adds a nonce that we have to sign and the idp now knows that we really posess it
|
||||
|
||||
TODO: Steal image from Slides
|
||||
|
||||
### OpenPubKey
|
||||
|
||||
> Assigns meaning to the nonce and can reconstruct the nonce for a reverse check
|
||||
|
||||
## Demo
|
||||
|
||||
The demo uses GitHub as a PKI (since all public keys get exposed via github).
|
||||
Pretty cool: They automated the demo via a go cli.
|
||||
|
||||
TODO: Link to demo code
|
||||
TODO: Steal image from Slides
|
||||
|
||||
## Next steps
|
||||
|
||||
- SPIFFE is the de-facto standard for distributing identities to workloads
|
||||
1. Workloads asks "Who am I"
|
||||
2. Agent attests the workload
|
||||
3. Agent provides OIDC or X.509 to Workloads
|
||||
|
||||
* WIMSE RFC: Basicly DPoP/OpenPub
|
||||
1. Workload get's a private key
|
||||
2. Issuer binds workload identity to the public key
|
||||
3. Auth trusts SPIFFE, it can trust the key
|
||||
73
content/day-2/10_auto-scale.md
Normal file
@@ -0,0 +1,73 @@
|
||||
---
|
||||
title: "The auto-scaling part: VPA, HPQ, KEDA, Nodes, How do they dance"
|
||||
weight: 10
|
||||
tags:
|
||||
- rejekts
|
||||
---
|
||||
|
||||
{{% button href="https://www.youtube.com/watch?v=1US_-3udMDo" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Hypothesis
|
||||
|
||||
- In 2024 27% of cloud spent was wasted
|
||||
- 100ms delay => decrease in sales
|
||||
|
||||
## Pod resources
|
||||
|
||||
- Requests: Informs scheduler's decision
|
||||
- Too low: Schedule on strained nodes
|
||||
- Too high: Wasted resources
|
||||
- Limits: Throttels (CPU) or Kills (Memory) if reached
|
||||
- QoS: sort the eviction priority during ressource pressure
|
||||
- Quranteed (request=limits)
|
||||
- Burstable (Limits>Requests)
|
||||
- Best effort (Nothing defined)
|
||||
- Gotcha: CPU throtteling can happen before tirggers happen if requests and limits are very close
|
||||
|
||||
TODO: Steal table from Slides
|
||||
|
||||
Requests | 100m, 256Mi | 100m, 256Mi
|
||||
Limits |100m, 256Mi | None or <limits
|
||||
QoS | Gurantee | Burstable | Best effort
|
||||
|
||||
## Scalers
|
||||
|
||||
- VPA: Moar power aka reccomend requests
|
||||
- HPA: Moar moar aka more replicas
|
||||
- KEDA: Proxy over HPA
|
||||
|
||||
### VPA
|
||||
|
||||
Modes:
|
||||
- Off: Dry-Run
|
||||
- Initial: Applies Reccomendations to new Pods (can be used for finding out)
|
||||
- Auto/Recreate: Evicts and restarts pods to update resources
|
||||
|
||||
Trigger: Usually Memory
|
||||
Tip: `maxAllowed` in order to not exhaust stuff
|
||||
|
||||
|
||||
### HPA
|
||||
|
||||
- Trigger: Usually cpu (percent of requests)
|
||||
- Formula: $1+\frac{usage}{target}$
|
||||
- Fun fact: Can not scale to 0
|
||||
|
||||
### KeDA
|
||||
|
||||
- Basicly automates HPA with flexible metrics (from different soruces)
|
||||
- Can scale Jobs
|
||||
- Can Scale to 0
|
||||
|
||||
## Anti patterns
|
||||
|
||||
TODO: Steal from slides
|
||||
|
||||
| Pattern | Bad | Better
|
||||
| CPI limit = Requests | Throtteling before scale | Set requests only |
|
||||
|
||||
|
||||
## Demo
|
||||
|
||||
Auto scaling meme generator (see slides/video)
|
||||
@@ -10,7 +10,12 @@ This is the first day of Cloud Native Rejekts and the first time of me attending
|
||||
|
||||
> Ranked by should watch to could watch
|
||||
|
||||
- How to hire, manage and develop engineers: [Tech is broken and AI won't fix it](../05_broken-tech)
|
||||
- What if my homelab is an african island: [Geographically Distributed Clusters: Resilient Distributed Compute on the Edge](../06_geo-distributed-clusters)
|
||||
- Handling large number of clusters: [CRD Data Architecture for Multi-Cluster Kubernetes](../04_multicluster-crd)
|
||||
- Handling large scale migrations: [The Cluster API Migration Retrospective: Live migrating hundreds of clusters to Cluster API](../02_clusterapi)
|
||||
- How to hire, manage and develop engineers: [Tech is broken and AI won't fix it](./05_broken-tech)
|
||||
- What if my homelab is an african island: [Geographically Distributed Clusters: Resilient Distributed Compute on the Edge](./06_geo-distributed-clusters)
|
||||
- Bootstrap and CI/CD with crossplane: [Building air-gapped control planes for a global pharma leader using crossplane and argo](./08_airgapped-cp)
|
||||
- Handling large number of clusters: [CRD Data Architecture for Multi-Cluster Kubernetes](./04_multicluster-crd)
|
||||
- Handling large scale migrations: [The Cluster API Migration Retrospective: Live migrating hundreds of clusters to Cluster API](./02_clusterapi)
|
||||
|
||||
## Other stuff I learned or people i talk to
|
||||
|
||||
- Throughout the lunch break I talked to a nice guy who heared my government question during the [Tech is broken and AI won't fix it](./05_broken-tech)-Talk, we talked
|
||||
27
content/day0/01_project-update.md
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
title: Project update
|
||||
weight: 1
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/70/Platforms%20WG%20Update%20slides%20-%20Kubecon%20EU%202025.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
An update from the platform working group which will be renamed to the CNCF Platform Engineering Community.
|
||||
Alongside the new name a bit of restructuring will take place bacause the working group outgrew the working group label.
|
||||
|
||||
## Initiatives
|
||||
|
||||
### Supported initianives
|
||||
|
||||
- Platform Glossary and Whitepaper: What is a platform
|
||||
- Platform Maturity Model & Assesment: A Platform is a living thing that evolves
|
||||
- Platform as a Product: Currently in the research stage
|
||||
- Platform Community Formation: The - above mentioned - restructuring
|
||||
|
||||
### Monitored Initiative
|
||||
|
||||
- Cloud Native Platform Engineering Associate (CNPA): Certification is being formed
|
||||
- Cloud Native Platform Engineer (CNPE): Will follow after CNPA
|
||||
30
content/day0/02_sponsored-stbsdw.md
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
title: Stop building, start delivering workloads
|
||||
weight: 2
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
- sponsored
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/7tbs3J7mgE0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## States of platform
|
||||
|
||||
1. Platform is being build and getting delayed
|
||||
2. Platform finished and not adopted
|
||||
3. Re-Platforming and guessing if the new platform will meet the same end
|
||||
4. Platform is low maintainance and devs are happy (nice story bro)
|
||||
|
||||
Failure should be fine but it's no longer an option in most cases
|
||||
|
||||
## What do we want?
|
||||
|
||||
> Whishlist
|
||||
|
||||
- Support for all workload
|
||||
- Consistent experiences across ui, api, cli and gitops
|
||||
- Pathway from preview to prod
|
||||
- Multi-cloud and onprem
|
||||
- Abstract infra
|
||||
32
content/day0/03_sponsored-cortex.md
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
title: "Platform Engineering with a Product Management Mindset: 10x your DevEx"
|
||||
weight: 3
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
- sponsored
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/MFLXFNlmMMI" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
This whole talk is pretty much a product managers view on platform engieering.
|
||||
|
||||
## Where can it go wrong
|
||||
|
||||
- Assuming customer needs - build for hypothetical developers
|
||||
- Output > Outcome
|
||||
- Ignore stakeholder ecosystem
|
||||
|
||||
TODO: Steal slide
|
||||
|
||||
## PaaP (Platform as a product)
|
||||
|
||||
- Anticipate developer needs: Dont just fulfill requests
|
||||
- Design for all personas and survey related teams
|
||||
- Prioritize Features according to research themes
|
||||
- Deliver inremental value with feedback loops
|
||||
|
||||
## Hierarchy of goals and baselines
|
||||
|
||||
TODO: Copy slide over
|
||||
27
content/day0/04_sponsored-gitpod.md
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
title: "The platform Engineer gauntlent: Three defining challenges in the AI era"
|
||||
weight: 4
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
- sponsored
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Conviciton
|
||||
|
||||
- Background: There is an absence of platform leadership
|
||||
- Reason: Most "leaders" don't push services or features to developers with conviction
|
||||
- Solution: Be proud and use your leadership role with courage
|
||||
|
||||
## Focus
|
||||
|
||||
- Focus on developers
|
||||
- Don't only focus on the production ecosystem (observability, ci/cd) but also the path to this end
|
||||
|
||||
## Foundations
|
||||
|
||||
- Problem: Many companies are running behind their ai goals thanks to missing baseline automation
|
||||
- Solution: Embrace the AI
|
||||
13
content/day0/05_sponsored-vultr.md
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
title: "Containerization beyond CPUs - A Kubernetes based serverless platform for ai native applications"
|
||||
weight: 5
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
- sponsored
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/XrMsJIL35Oc" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
Hypothesis: We are at the beginning of a 10 year cycle that is moving towards ai-native applications.
|
||||
61
content/day0/06_hire-engineers.md
Normal file
@@ -0,0 +1,61 @@
|
||||
---
|
||||
title: So you want to hire for platform engineering
|
||||
weight: 6
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/cl-MO7j7MHY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
Hypothesis: The bar for good interviewing is somewhere near the earth's core and we need to improve this (because we need more engineers)
|
||||
|
||||
## Resilience engineering
|
||||
|
||||
> The overarching concepts that apply to platforms or just "how to make code work"
|
||||
|
||||
Idea: Four main goals that align with different roles unter the mothership "resilience engineering"
|
||||
|
||||
- Rebound: SRE
|
||||
- Robustness: Infra
|
||||
- Graceful extensibility: Platform Engineering
|
||||
- Sustained adaptability: DevEx (often pulled out into something else)
|
||||
|
||||
Bonus things to look out for
|
||||
|
||||
- Intellectual Humility: The ability to learn new things and accepting that you might now much but not everything
|
||||
- Ecological awe: The awe expereienced when looking at beautiful nature and feeling small or just looking at the cncf landscape
|
||||
|
||||
## What do you need for the first team
|
||||
|
||||
- People who are able to hire new people and willing to step up to leadership in the long term
|
||||
- Generalists
|
||||
|
||||
## The process and what to do
|
||||
|
||||
What should happen before we hire someone (either in one or multiple interviews).
|
||||
|
||||
1. Learn about each other
|
||||
2. Solve a technical problem together
|
||||
3. Solve a socological problem together
|
||||
4. How do you and your future coworkers/stakeholders get along
|
||||
|
||||
Make sure the end2end time (first interview to ye or no) is low (best is under two meeks)
|
||||
All of your current engineers should be able to pass the interview without studying in advance (no stupid)
|
||||
|
||||
## Potential Failures and fallacies
|
||||
|
||||
- The fallacy of demographics in = demographics out
|
||||
- Treating interviews like hazing
|
||||
- you don't track afer-hire indicators
|
||||
- Whireboard interviews: They are stupid repetition and regurgitation and have 0 relations to the real world work
|
||||
- There are no real studies on how to asses and hire talent
|
||||
|
||||
### Flags
|
||||
|
||||
- Passion is usually interpreted as "puts up with abuse" and should not be mistaken for caring -> See "Ecological awe"
|
||||
- Side projects probably indicate lack in family/social time "i make my wife raise the kids" -> Sideprojects are not a good indicator, maybe their are brilliant at their job but love their free time
|
||||
- A Moneyball-like process (data-driven decision) completely counters how talent is perceived -> Expand the hiring pool to anybody and ignore the clasical "indicators of talent"
|
||||
- Discriminated demographics probably have a better grip on systems thinking (doe to being forced to make choices)
|
||||
- Systems thinking is more important than platform knowledge (If you can think in terms of organization and dependencies you can work on platforms)
|
||||
62
content/day0/07_past-present-future.md
Normal file
@@ -0,0 +1,62 @@
|
||||
---
|
||||
title: The past, the present and the future of platform engineering
|
||||
weight: 7
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
- viktor
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/uwDoHm-AxTM" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
The good old baseline is "iam an an developer, i write code - now i have to do stuff to continue writing code".
|
||||
Most developers will continue on to "now i have to write scripts" on order to just do their jobs instead of working on infra.
|
||||
|
||||
These scripts evolve to tools which evolve into an internal platform (if everyone starts using it).
|
||||
Other base components can also feel like platforms (for example application servers).
|
||||
|
||||
## The early day evolution
|
||||
|
||||
- Hudson
|
||||
- Docker: Not really building platforms, rather standardized application packaging
|
||||
- Kubernetes (and Nomad + Swarm): A new concept of scheduling instead of jsut running the application in a container
|
||||
|
||||
=> We've been building platforms (or failing to build them) for years and years but now we kinda agree about what a platform is
|
||||
|
||||
## Present
|
||||
|
||||
We have the base idea of a platform
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
ServiceConsumers-->|Consume through|HTTPAPI-->|Trigger work on|Controllers-->|So|Services
|
||||
ServiceOwner-->|Manages|Services
|
||||
```
|
||||
|
||||
- The fist question: Do we use public controllers (e.g. the cncf landscape projects) or build our own?
|
||||
- Result: Mostly a mix starting with public, realizing needs and expanding
|
||||
|
||||
## Make it your own
|
||||
|
||||
- Goal: Make the platform domain specific for your developers
|
||||
- Evolution: Tools like DAPR for developers or Crossplane for api-building
|
||||
- Build the API and Controllers first - dashboard, gitops, observability, ... second
|
||||
- Remember that kubernetes can manage anything - not just containers
|
||||
|
||||
TODO: Steal image
|
||||
|
||||
## Blueprints
|
||||
|
||||
Take all of the projects you need, combine them and hide the complexity
|
||||
High level architecture of internal platforms is the same as public ones (aws, ...) but internal and built on kubernetes.
|
||||
|
||||
TODO: Steal images for platform blueprints (3 slides)
|
||||
|
||||
## Future
|
||||
|
||||
- Platform Engineering certification by the CNCF is on the horizon
|
||||
- Do we need to hide kubernetes from developers? Maybe -> The CNCF is starting groups to get app devs closer to platform engineers
|
||||
- More multi-cluster specialized tools are sprawling in the last year (scheduling, discovery, management)
|
||||
- AI things are happening and we should utilize it but not just by calling a llm directly and calling it a day -> e.g. dapr llm abstraction api
|
||||
- Platforms are not built in isolation, we need to help each other
|
||||
75
content/day0/08_product-thinking.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
title: Product thinking for cloud native engineers
|
||||
weight: 8
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/8_pB9RAfzrY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/48/Product%20Thinking%20for%20Cloud%20Native%20Engineers%20PlatformEngineeringDay-EU-25.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## How & Why
|
||||
|
||||
- IT was a cost center for a long time - not it's critical but still treated as a cost center
|
||||
- Why is it important: To much focus in the technical aspects instead of value delivery
|
||||
- Importance: Show the value of your work (which means your work has to provide value)
|
||||
- Operations and coordination work is not easily visible, but very important
|
||||
|
||||
## Principles
|
||||
|
||||
- Focus on user value: User problems > Solutions
|
||||
- Outcome (Value) > Output (Tickets closed)
|
||||
- Products (lifecycle and ownership) before projects (just setting stuff up)
|
||||
|
||||
### User value
|
||||
|
||||
- "Who is the user": Builders, Enablers, Regulatory, "Viewers"
|
||||
- "What is the value": Make the organization more efficient while avoiding risks
|
||||
|
||||
## How to start?
|
||||
|
||||

|
||||
|
||||
### Exploring the Problem Space
|
||||
|
||||
Goals:
|
||||
- Identify top pains
|
||||
- Build empathy and understanding
|
||||
- Investigate key business aims
|
||||
|
||||
Techinques:
|
||||
- Customer and stakeholder interviews: Talk to people, they will probably tell you about their pain
|
||||
- Data/Process analysis: Where are out bottlenecks
|
||||
- Shadowing: Really see how the day to day works
|
||||
- Ask "Why"
|
||||
- Read business updates (current goals)
|
||||
- Build dashboards that show progress and value
|
||||
|
||||
### Defining the problem space
|
||||
|
||||
Goals:
|
||||
- Identify opportunities
|
||||
- Prioritise
|
||||
- Gather insignts and data
|
||||
|
||||
Techniques:
|
||||
- Value stream mapping
|
||||
- RICE, Value vs Effort or ather cost benefit analysis
|
||||
- Analyse your exploration process
|
||||
|
||||
## Did we reach our goal?
|
||||
|
||||
### Product metrics
|
||||
|
||||
- Someone will measure your work, hope they do it right or rather do it yourself to show how you provide value
|
||||
- Product metrics should measure outcome not output (or performance metrics)
|
||||
- Baseline: You need to know the desired outcome
|
||||
|
||||
|
||||
### Frameworks
|
||||
|
||||
- DevEx: Triangle of flow state (build&test speed), feedback loops () and cognitive load (code complexity, docs clarity)
|
||||
- DORA
|
||||
- SPACE
|
||||
- DX Core 4
|
||||
129
content/day0/09_promotions.md
Normal file
@@ -0,0 +1,129 @@
|
||||
---
|
||||
title: A million ways to promote changes between environments
|
||||
weight: 9
|
||||
tags:
|
||||
- argo
|
||||
- cloudnativecon
|
||||
- viktor
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/iCTgRC3AQQk" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## Baseline
|
||||
|
||||
- Promotion: Move things from one env to another
|
||||
- Options: Sequentially or both
|
||||
- Challenge: Env differences
|
||||
- Challenge: How do we link our promotion tasks?
|
||||
|
||||
### GitOps
|
||||
|
||||
- Declarative: YAML, JSON, XML (Not helm or kcl or anything else)
|
||||
- Versioned and immutable: Git
|
||||
- Pulled automatiocally: No wirte access from cluster
|
||||
- Continously reconciled: Maintain parity between desired and actual state
|
||||
|
||||
### Rules
|
||||
|
||||
- Part of SLDC
|
||||
- Declarative
|
||||
- Versioned and immutable
|
||||
- Pulled automatiocally
|
||||
- Continously reconciled
|
||||
|
||||
## Workflows
|
||||
|
||||
### Manual
|
||||
|
||||
1. Deploy
|
||||
2. Run tests
|
||||
3. Push to next stage
|
||||
4. Test again or roll back
|
||||
|
||||
### Manual with gitops
|
||||
|
||||
1. Update manifest
|
||||
2. Push to git
|
||||
3. Test
|
||||
4. Next stage
|
||||
|
||||
Problem: Eventual consistency makes the process async instead of sync (important for tests)
|
||||
|
||||
### Generic workflows
|
||||
|
||||
1. Dev: Bump, push
|
||||
2. QS: Wait for success of 1 (how?), do the same
|
||||
3. Prod: Wait for success of 2 (how?)
|
||||
|
||||
TODO: Steal code screenshots from slides
|
||||
|
||||
## Tools
|
||||
|
||||
### Extend your standard CI
|
||||
|
||||
|
||||
Not async, risk of flapping, either blindly trust the state or break the pull-principle by running argo sync or kubectl apply
|
||||
|
||||
### AppSets Progressive Sync
|
||||
|
||||
- Built in to Application Sets (alpha)
|
||||
- Targeting by label, promotes everything
|
||||
- Not supported with autosync, bechause it basically manually triggers sync one after another
|
||||
- Changes from git have to be manually triggered
|
||||
|
||||
### Image updater
|
||||
|
||||
- Subscribe to semver based image updates and write them to kubernetes and/or git
|
||||
- You have to implement promotions via image naming schemes
|
||||
|
||||
TODO: Steal flowchart
|
||||
|
||||
### Kargo
|
||||
|
||||
- Freight: Artifact or manifest versions to promote
|
||||
- Stage: ArgoCD Apps
|
||||
|
||||
TODO: Steal flowchart
|
||||
|
||||
### Telefonistka
|
||||
|
||||
- IaC Agnostic tooling
|
||||
- Idea: Watch folder contents and copy contents to new folder
|
||||
- Pretty mutch a bundeled CI-Script
|
||||
|
||||
TODO: Draw your own chart
|
||||
|
||||
### Codefresh GitOps
|
||||
|
||||
> This is one of the speaker's tools
|
||||
|
||||
- Product: Applications with relationships
|
||||
- Env: Any cluster and/or namespace
|
||||
- Promotion: CRD for policy (when does it happen, what get's validated)
|
||||
- Promotions can happen manually or automated via commit/pr
|
||||
- BAsed on argo workflows
|
||||
|
||||
### GitOps Promoter (Intuit)
|
||||
|
||||
- Define Manifests once and hydrate them later
|
||||
- Sourcehydrator: Argocd feature that handels the rendering and commits it to a new dedicated branch (one branch per stage)
|
||||
- The Branches are the branches used by argo, e.g. `environments/dev` get's watched by the dev cluster
|
||||
- Changes result in environment proposal branches, PR get's oppened, PR checks run, when PR requirements are met (Tests), it will merge them into the real env branches
|
||||
|
||||
TODO: Steal Pattern
|
||||
|
||||
## Overview of the philosopies
|
||||
|
||||
Artifact Oriented: Imageupdater, Kargo
|
||||
Define Manifests once: AppSets Progessive Sync, GitOps Promoter
|
||||
Deff and workflow: CI, Codefresh
|
||||
|
||||
TODO: Steal from slides
|
||||
|
||||
## Best practives
|
||||
|
||||
- Can you recover from git at any point? No -> Do better
|
||||
- Does git reflect what's deployed without looking?
|
||||
- Does this enable SDLC?
|
||||
- Interfaces in folders, not branches? -> Branches may get crowded
|
||||
89
content/day0/10_abstractions.md
Normal file
@@ -0,0 +1,89 @@
|
||||
---
|
||||
title: "Platform abstractions: Asset or liability"
|
||||
weight: 10
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/M5X5NCzlzIA" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/52/atul-talk-platform-engineering-kubecon-london-2025_final.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
Fair warning: Food analogies incoming
|
||||
|
||||
## Baseline
|
||||
|
||||
### What do abstractions achive
|
||||
|
||||
- Structure through simplification
|
||||
- Complexity made simple
|
||||
- Hiden Details, visible value
|
||||
|
||||
### Dilemma
|
||||
|
||||
1. Platform team creates abstraction
|
||||
2. Abstraction works for 10 Teams
|
||||
3. Other team requests extension
|
||||
4. Question: How do we deal with this
|
||||
|
||||
### Possible Solutions
|
||||
|
||||
- Add Config Options: Increases complexity of abstraction
|
||||
- Make One-off exceptions: Breaks standardization, introduces inconsistency
|
||||
- Require conformity: Hinders innovation, creates enemies
|
||||
- Allow bypassing: Creates shadow it, risking security and resource control
|
||||
|
||||
=> Debt trap: The cost of maintaining a stable platform rises and rises
|
||||
|
||||
## The debt cycle
|
||||
|
||||
### The abstraction cycle
|
||||
|
||||
1. Simplify
|
||||
2. Adobt
|
||||
3. New Requirements
|
||||
4. Add complexity
|
||||
5. Repeat
|
||||
|
||||

|
||||
|
||||
### Warning signs
|
||||
|
||||
- Rizing customization requests
|
||||
- Workarounds
|
||||
- Shadow IT
|
||||
|
||||
### Impact
|
||||
|
||||
- Each new feature becomes harder to implement
|
||||
- Teams lose trust in the platform capabilities
|
||||
- Platform evolutions slows down
|
||||
- New tech is difficult to incorporate
|
||||
|
||||
## Abstraction elacity
|
||||
|
||||
> The abstraction should stretch a bit to accommodate change without brakuing
|
||||
|
||||
- Adaptability: Ease of handling new requirements
|
||||
- Transparency: Understand what your user wants and why
|
||||
- Extension PAtterns: Document ways to customize the platform behavior
|
||||
- Migration Paths: Ease of moving away from the platform abstraction
|
||||
|
||||
### Elasticity
|
||||
|
||||
- Can teams access lower level controls (when needed) while staying with the abstraction
|
||||
- Do users understand what happens underneath (when needed)
|
||||
- Are ther documented extension/customization points?
|
||||
|
||||
## Patterns to break the debt trap
|
||||
|
||||
- Layered abstraction patterns: start with low-level abstractions that get abstracted on higher levels to allow users to choose the right abstraction level for themselves without having to configure everything themselfes
|
||||
- Expert-ap: Additional api parameters that are not needed but can be set
|
||||
- Policy based guard rails: Change the guardrails based on the environment (e.g. deep access in dev, not in prod)
|
||||
|
||||
## The end goal
|
||||
|
||||
- Increase adoption
|
||||
- Eliminate shadow IT
|
||||
- Improved satisfaction
|
||||
- Reduced overhead
|
||||
43
content/day0/11_t-env.md
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: "The story of t-env: Scaling a platform to impriove the volocity of hundreds of developers"
|
||||
weight: 11
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/qXRHpIYxU_c" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/da/KubeCon%20Talk_%20Lemonade%27s%20t-env.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
Okteto: Ephemeral environents for testing
|
||||
|
||||
## History
|
||||
|
||||
- Starting point: Local Dev -> Setup for new devices or devs is realy slow (on average 10hrs a week)
|
||||
- Next Idea: EC2 Instances with a fancy docker-compose and scripts -> No more local dev
|
||||
- Problems: Still complex - just in the cloud, manual updates, allways-on required (no working in the train)
|
||||
- Risks: Developers will just create workarounds and shadow it
|
||||
|
||||
## T-Env
|
||||
|
||||
- Baseline: Setup an environment on kubernetes for each dev with ci/cd
|
||||
- Okteto: A single command to enter dev mode `t dev start` with file sync from local
|
||||
- Implementation: Wrapper arount the okteto cli
|
||||
- Why: Becaus dev seems to love the cli
|
||||
- Self service observability for troubleshooting in your env
|
||||
|
||||
Used Open soruce Tools: Pulumi, Grafana, Okteto, K8s
|
||||
|
||||
### Did it work?
|
||||
|
||||
- The time to test is way faster
|
||||
- The path was clear
|
||||
- The environments should be ephemeral but devs don't like that -> They decided to allow for long lived envs
|
||||
- Cloud cost is relatively high with long living envs -> They implemented a sleep system based on dev timezone
|
||||
(or manual wake-up)
|
||||
|
||||
## The futuuuuure
|
||||
|
||||
- The company is not getting smaller -> More devs annd more services
|
||||
- AI agents will write some of the code in the future
|
||||
- Idea: Only run modified code in env instead of everything
|
||||
50
content/day0/12_many-clusters.md
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: "Perfomance preseverance: Taming 1000 kubernetes clusters"
|
||||
weight: 12
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/ZTT8M74RD1M" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/d5/kubecon_2025_v4.2.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## History
|
||||
|
||||
- They started with upstream kubernetes - the hard way
|
||||
- Env grew to over 200 prod apps
|
||||
- Pains: Single Cluster, single point of failure and complexity
|
||||
- What worked: Dev adoption and autonomy, no vendor
|
||||
|
||||
## Challenges
|
||||
|
||||
> Based on stakeholder expectations
|
||||
|
||||
- One tenant per cluster -> Over 1000 Clusters
|
||||
- Release management
|
||||
- Small team (3 Engineers)
|
||||
|
||||
## Guiding principles
|
||||
|
||||
- Platform as a product
|
||||
- Stability: trust
|
||||
- Standardization -> Scalability and inter team collab
|
||||
- Day 2 support
|
||||
- Dogfooding
|
||||
|
||||
## Tenancy
|
||||
|
||||
- One cluster per product
|
||||
- Own CLI, devs like cli
|
||||
- Custom operator and crds
|
||||
|
||||
## Stack
|
||||
|
||||
- Keopsctl? Pretty much their own cluster operator
|
||||
- A Simple Cluster CRD
|
||||
|
||||
## Migration
|
||||
|
||||
1. Build trust in platform
|
||||
2. Support with docs, oboarding, q&a
|
||||
3. Co-create with devs while keeping an eye on day2 -> Feature-Flag based rollout
|
||||
56
content/day0/13_paap.md
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
title: Platform as a Product
|
||||
weight: 13
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/DoiaHfl9Y7Y" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
The CNCF's research into product thinking for platforms.
|
||||
|
||||
## But why
|
||||
|
||||
- Get insights into the current product thinking practives of platform builders
|
||||
- Topics: Needs/Paintpoints/Behaviour
|
||||
- Target: Create personas based on insights
|
||||
- Find out what people are doing, not hew they are doing
|
||||
|
||||
## How?
|
||||
|
||||
- Survey for quantity
|
||||
- Interviews for quality
|
||||
|
||||
## Challenges
|
||||
|
||||
- Asking questions without sugessting answers
|
||||
- Consensus on research goals
|
||||
- Motivation and time investment (on interviewer and interviewee side) + Non-Responses
|
||||
- Toolsing: There is no standard tooling at the CNCF for this kind of research
|
||||
- Small sample size -> No real research insights, just signals/hints
|
||||
|
||||
## Analysis
|
||||
|
||||
- Working with assumptions was hard in combination with the small sample size
|
||||
- Survey: Survey Tool (Google Forms) combined with a whiteboard tool for clustering and analysis
|
||||
- Interviews: They used ai for time efficiency but the prompt escalated a bit leading to no real time gain -> But you can scale the same prompt to infinite sample sized
|
||||
- Challemnge: AI confidently churns out wrong answers -> Use source links to verify and scoping
|
||||
|
||||
TODO: Steal worklow from slides
|
||||
|
||||
## Outcome/Signals
|
||||
|
||||
- Platform Orgs use Prioritization Frameworks onconsciously: "We don't use product management and tools like that" -> Well you do, you just don't call it PM and are a bit unstructured
|
||||
- Structured Activities: Interviews (talking to each other), Focus groups, quantitative data, ...
|
||||
- Roadmap influence: Insight, prioritization, painpoints, backlogs
|
||||
- Regular planning meetings
|
||||
- Platform orgs struggle to define and actually implement measures of success: Measure activity over impact, success is often felt instead of proved
|
||||
- Platform teams have varied control over their work: Depndening on company size and business relationships
|
||||
|
||||
## Future
|
||||
|
||||
- Baseline: They have some signals
|
||||
- Question: Are these pattern successfull
|
||||
- Needed: More data and better organization
|
||||
58
content/day0/14_lego.md
Normal file
@@ -0,0 +1,58 @@
|
||||
---
|
||||
title: Building Platforms with empathy and yaml at the lego group
|
||||
weight: 14
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/8FmJWd7vRt4" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
Very nice kids playing with lego intro analogy about creativity, sharing and colaboration.
|
||||
|
||||
## The golden brick
|
||||
|
||||
- The brick could get picked up and sometimes picking it up is mandatory
|
||||
- Devemopment in close colab and trust with users
|
||||
- Focus on good enough instead of perfect but everyone is unhapy
|
||||
|
||||
### Guidelines
|
||||
|
||||
- API first: Define a speration beween users and services with abstractions
|
||||
- Self services: Freedom of choice and combination
|
||||
- Constraints that are soft and can be modified on feedback
|
||||
|
||||
### Offers
|
||||
|
||||
- Kubernetes as a service
|
||||
- Runtime as a Service: NAmespace as a service with argo and without cluster access
|
||||
- Problem: Users want kubeapi access
|
||||
- Method: Talk with the users
|
||||
- Solution: Zero Trust proxy that provides operational access to kubeapi via OIDC
|
||||
- There are multiple APIs that can be combined -> You need constraints
|
||||
|
||||
### What's needed
|
||||
|
||||
- Conversation
|
||||
- Trust
|
||||
- Striking a balance
|
||||
|
||||
## The human aspect
|
||||
|
||||
- Treat people as colleagues instead of customers
|
||||
- Build empathy to reach a ballanced "good enough"
|
||||
- Lead with transparency: Publish your metrics
|
||||
- Visit their context
|
||||
- Explore unknowns together
|
||||
- Create a shared understanding of challenges
|
||||
|
||||
### Team culture
|
||||
|
||||
- Know who you are helping an who helps you
|
||||
- Empower them to shine by getting to know their context
|
||||
- Hear them out in small meetings ore in person
|
||||
|
||||
## Platform maturity
|
||||
|
||||
TODO: Steal maturity chart
|
||||
29
content/day0/15_internal-marketing.md
Normal file
@@ -0,0 +1,29 @@
|
||||
---
|
||||
title: 10 Quick tips on how to internally market your platform
|
||||
weight: 15
|
||||
tags:
|
||||
- platform
|
||||
- cloudnativecon
|
||||
- lightning
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/kiUV8En8Co4" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/42/2025-PE-Day-10-Tips.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## Baseline
|
||||
|
||||
- Event great tech does not sell itself - you need marketing
|
||||
- We don't have a big marketing budget for our internal platform
|
||||
- No adoption -> No Trust -> No new users -> No adoption
|
||||
|
||||
## Tips
|
||||
|
||||
- Define personas and a value proposition map
|
||||
- Build a brand: Name, logo, story, swag
|
||||
- Have a launch party or milestone parties
|
||||
- Provide clear accesible communication (with clear channels, docs, ...)
|
||||
- Build a commmunity that can help each other (and don't seperate yourself from the community)
|
||||
- Capture metrics for success for yourself and from a user's perspective
|
||||
- Provide a 5minute Wow-Moment/demo werhe the user can feel like they achived something
|
||||
- Level up with gamification
|
||||
- Leverage external events for internal visibility
|
||||
BIN
content/day0/_img/abstraction-cycle.png
Normal file
|
After Width: | Height: | Size: 572 KiB |
BIN
content/day0/_img/product-compass.png
Normal file
|
After Width: | Height: | Size: 270 KiB |
@@ -4,8 +4,27 @@ title: Day 0
|
||||
weight: 4
|
||||
---
|
||||
|
||||
TODO:
|
||||
Day 0 of KubeCon aka CloudNativeCon aka the day on which the co-located events happen.
|
||||
This year I spent most of my time at the platform engineering day (with a short visit to argocon).
|
||||
The emerging motto of platform engineering day was "platform as a product".
|
||||
|
||||
This was the third conference day (fourth travel day) and in the afternoon i started to feel the brain-overflow.
|
||||
But powewring through I ended up attending two keynotes (no notes, they were pretty much a welcome and goodbye) and 14 talks.
|
||||
|
||||
And most importantly: This is the day my friends an coworkers joined (they are only in town for kubecon, not for rejekts).
|
||||
Sometimes we ended up in the same talks, sometimes in different talks which lead to a rich set of talk notes.
|
||||
|
||||
## Talk recommendations
|
||||
|
||||
* TODO:
|
||||
- How to design a good hireing process: [So you want to hire for platform engineering](./06_hire-engineers)
|
||||
- Evolution of Platforms and Platform Engineering: [The past, the present and the future of platform engineering](./07_past-present-future)
|
||||
- How to design a good product: [Product thinking for cloud native engineers](./08_product-thinking)
|
||||
- Staging with gitops: [A million ways to promote changes between environments](./09_promotions)
|
||||
- How to handle abstractions and new requriements: [Platform abstractions: Asset or liability](./10_abstractions)
|
||||
- Very nice slides: [Building Platforms with empathy and yaml at the lego group](./14_lego)
|
||||
|
||||
## Other stuff I learned or people i talk to
|
||||
|
||||
- Talked to the Vultr people - they have a manifesto for ai with amd and nvidia gpus
|
||||
- Talked to Meshcloud: They build developer platform tooling (currently mostly integrated with cloud providers)
|
||||
- Want to look into Okteto for dev envs: <https://github.com/okteto/okteto>
|
||||
77
content/day1/01_scaling-gpu.md
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: Scaling GPU Clusters without melting down
|
||||
weight: 1
|
||||
tags:
|
||||
- ml
|
||||
- nvidia
|
||||
- ai
|
||||
- apiserver
|
||||
- go
|
||||
- kubecon
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/dUfp3j1j-mg" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/50/Scaling%20GPU%20Clusters%20Without%20Melting%20Down%21%20%281%29.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## Baseline
|
||||
|
||||
- We need mroe and more gpus -> Control Plane needs to keep track of more objects
|
||||
- Goal: Scale Workers without scaling control plane
|
||||
|
||||
## Current Problems
|
||||
|
||||
### Secret list calls go up and control plane goes down
|
||||
|
||||
- Scenario: High number of list calls with larger secrets
|
||||
- Problem: OOM apiserver b/c cache
|
||||
- Fix: API Priority & Fairness (only allow two concurrent list calls, queue the rest)
|
||||
- Result: Decreased number of oom crashes
|
||||
|
||||
### High memory usage until we restart the apiserver
|
||||
|
||||
- Scenario: API-Server frees up to 40% of it's memory util when restarted
|
||||
- Main suspect: Memory collection
|
||||
- Idea: Tune GOGC (ENV Var `GOCC`) -> They set the default 100 to 50
|
||||
- Result: Decrease in memory util and no more growing util over time
|
||||
|
||||
### Large skew in memory utilization
|
||||
|
||||
- Scanario: Scew between api server memory utilization across api-server pods
|
||||
- Problem: If a pod with high util get's hist with a list, the api-server will oom -> The LB redirects to the other 2 -> Those OOM
|
||||
- Observation: The lb in fron of the api server pods also shows some skew -> Explains the skew
|
||||
- Root cause: lb has long living tcp connections to the servers and balances based on connections and not requests
|
||||
- Idea: Switch up the lb configuration -> Not quite the right angle
|
||||
- Fix: Goaway-chance param in apiserver - random `COAWAY TCP` message get's sent -> Tearing down connection gracefully, recreate connection
|
||||
|
||||
### Architectural mistakes
|
||||
|
||||
- Large number of secrets per workload -> List, Encode/Decode overhead
|
||||
- No caching -> To many list calls
|
||||
|
||||
### Preview
|
||||
|
||||
- There are a bunch of sig api-machinery improvements planned
|
||||
|
||||
## The future
|
||||
|
||||
- The switch from NUMA GPU-Devices to DRA
|
||||
- DRA is powerfull engough to get rid of custom numa stuff
|
||||
|
||||
### The stack
|
||||
|
||||
- Currently:
|
||||
- CP: APIServer, Controller manager, Scheduler and Topology aware scheduler
|
||||
- Worker: Device Plugin, nfd topology updater
|
||||
- Future
|
||||
- CP: APIServer, Controller manager, Scheduler
|
||||
- Worker: Device Plugin
|
||||
|
||||
### Testing scaling
|
||||
|
||||
- Tool: KWOK (Kubernetes WithOut Kublet) - used to simulate gpu workout
|
||||
- Env: K8S 1.32 with scaling from 0 to 4000 Workloads
|
||||
- Metrics:
|
||||
- Scheduling Latency: Topo aware was way more latency-affected
|
||||
- Scheduler Memory util: 30% of memory saved with dra
|
||||
- APi-Server Memory: Another 20& of memory saved
|
||||
- Result: They are confident that DRA will bew stable and even save memeory and cpu util
|
||||
81
content/day1/02_migrations.md
Normal file
@@ -0,0 +1,81 @@
|
||||
---
|
||||
title: Day 2000 - Migrating from kubeadm + ansible to clusterapi+talos
|
||||
weight: 2
|
||||
tags:
|
||||
- kubecon
|
||||
- platform
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/uQ_WN1kuDo0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/fd/day2000-migration-ClusterAPI-talos.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## Background
|
||||
|
||||
- They use large, shared clusters
|
||||
- The oldest cluster is 2099 days (5,8 years) old
|
||||
- Onprem hosted on vSphere with vanilla kubeadm
|
||||
- Fun fact: They run chaosmonkey on all clusters -> Automaticly prepares for updates
|
||||
|
||||
### Legacy provisioning
|
||||
|
||||
1. Terraform create debian vm
|
||||
2. Deploy base tools with puppet
|
||||
3. Register nodes in inventory yaml file
|
||||
4. run ansible playbook -> Renders configs and runs kubeadm
|
||||
5. Configure ArgoCD
|
||||
|
||||
### Target
|
||||
|
||||
- Use Clusterapi to manage the workload-clusters
|
||||
- Basic CRDS: Cluster, MachineDeployment, Machine
|
||||
- Talos: Immutable, minimal, ephemeral with declarative config via grpc api
|
||||
|
||||

|
||||
|
||||
|
||||
## Migration
|
||||
|
||||
1. Config matching between kubeadm and talos+capi
|
||||
2. Import PKI/Certs
|
||||
3. Create ClusterAPI CRDs
|
||||
4. Add ClusterAPI Nodes
|
||||
5. Remove kubeadm nodes
|
||||
|
||||
### 1. Config matching
|
||||
|
||||
1. Serviceaccount Issuer: Talos has it's own default
|
||||
2. etcd encryption key names are hardcoded in talos
|
||||
3. Re-Encrypt all secrets (get secrets, replace secrets)
|
||||
|
||||
### 2. PKI
|
||||
|
||||
1. Talos includes some logic that can generate a secrets bundle from an existing API
|
||||
2. Import: The etcd, k8s, serviceaccount and os (talos specific, used for the talos api auth) certificates
|
||||
|
||||
### 3. CRDs
|
||||
|
||||
- One namespace per workload cluster
|
||||
- Cluster-CRD: Ref to CP and Infrastructure
|
||||
- ControlPlane-CRD: Create cp MDs
|
||||
- Infrastructure: References template for wokrer-MDs
|
||||
|
||||

|
||||
|
||||
### 4. Add ClusterAPI Nodes
|
||||
|
||||
- Add new CP and Worker Nodes to the cluster that are managed by CAPI (slowly, stuff will break)
|
||||
- Remove the old nodes one by one over weeks ore months
|
||||
- Potential Problems:
|
||||
- Mismatched serviceaccountissuer
|
||||
- Missing etcd encryption key
|
||||
- Wrong etcd encryption key
|
||||
- Loss of quorum: `--force-new-cluster` can force recovery on one node of the etcd cluster
|
||||
|
||||
## Demo
|
||||
|
||||
I reccomend watching the demo
|
||||
Talos seems pretty cool.
|
||||
|
||||
## Bootstrapping
|
||||
|
||||
- Kind cluster in github action or on local device
|
||||
79
content/day1/03_operator-mistakes.md
Normal file
@@ -0,0 +1,79 @@
|
||||
---
|
||||
title: "Don't write controllers like charlie don't does: Avoiding common kubernetes controller mistakes"
|
||||
weight: 3
|
||||
tags:
|
||||
- kubecon
|
||||
- operator
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/tnSraS9JqZ8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/53/Don%27t%20write%20controllers%20like%20Charlie%20Don%27t%20does_%20avoiding%20common%20Kubernetes%20controller%20mistakes.pptx.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## Common mistake
|
||||
|
||||
### Not using a simple client but directly talk to the api server
|
||||
|
||||
- Problem: A
|
||||
- Problem: Updates send in the whole object -> Noop updates waste apiserver resources
|
||||
- Fix: Use a cache client
|
||||
- Problem: Caching validation
|
||||
|
||||
### Don't use custom caching
|
||||
|
||||
- Problem: Good Luck dealing with concurrency
|
||||
- Hard: Controllers mus maintain a per kind cache
|
||||
- Problem: Eventual consistency makes everything more complicated
|
||||
- Fix: Use a framework
|
||||
|
||||
### Predecates only apply to the current
|
||||
|
||||
- If you have a predecate in the for (predecate) only appy to this call, not to other watchers
|
||||
- Also check if you shold be reconciling your low-level object or reconciling the higher level ones that ref to them is better
|
||||
|
||||
## Tools
|
||||
|
||||
### KRT
|
||||
|
||||
> Still under development
|
||||
|
||||
- Operatorions in collections (kubernetes objects with state tracking)
|
||||
- Fetch function that handels transformation
|
||||
|
||||
### StateDB
|
||||
|
||||
- In-memory database for go with watch channels
|
||||
- You can setup a table that stores all objects of a kind (provided by the client)
|
||||
- Triggers hooks when changes happen in the database that you can react to
|
||||
|
||||
### Controller-Runtime
|
||||
|
||||
> The kubebuilder one
|
||||
|
||||
- Includes a chached client
|
||||
- Works on the reconciler pattern -> Makes triggers simpe
|
||||
|
||||
## Tips
|
||||
|
||||
- Limit the number of api server updates
|
||||
- Check for dif yourself and don't send updates if there is nothing new
|
||||
- Use patch instead of update just with changed fields -> Especially for `.status`
|
||||
- Use a framework that handles watching, coalescing and caching (krt, statedb, controller-runtime)
|
||||
- Use predecates if you're using controller-runtime, this helps you filter out no-op events by checking them against the cache and filters
|
||||
|
||||
## Q&A
|
||||
|
||||
- Do you know where your reconciliations are coming from:
|
||||
- Counts: Yes the frameworks provide metrics and you can implement your own
|
||||
- But controller runtime abstracts the patch source so you have to compare before and after state yourself - but you should not do that
|
||||
- What about state sharing across multiple threads?
|
||||
- Controller runtime handels each reconcile as idempotent, so you can just multithread
|
||||
- But handling consistency can still be hard because you have to design all of your operations as idempotent by rebuilding the state each time
|
||||
- What are your thoughts on controllers that do stuff in the real world (especially b/c it takes longer and there are no natie observers)
|
||||
- Do something like the krt project by keeping the state seperatly
|
||||
- What if someone changes things at the cloud provider
|
||||
- A question of philosophy -> Usually just treat the operator at the source of throuth
|
||||
- How do you test your operators?
|
||||
- Depends on your output (kubernetes objects make stuf simple)
|
||||
- For cilium: Simple b/c it's just creating kubernetes projects
|
||||
- With oputside interaction: In-memory state representation or mocking
|
||||
- For complex controllers split the operator into: Ingestion, data model and transformation
|
||||
56
content/day1/04_gpus-go-round.md
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
title: The GPUs on the bus go round and round
|
||||
weight: 4
|
||||
tags:
|
||||
- kubecon
|
||||
- gpu
|
||||
- nvidia
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/cLJRh4y4vXg" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Background
|
||||
|
||||
- They are the GForce Now folks
|
||||
- Large fleet of clusters all over the world (60.000+ GPUs)
|
||||
- They use kubevirt to pass through GPUs (vfio driver) or vGPUs
|
||||
- Devices fail from time to time
|
||||
- Sometimes failures needs restarts
|
||||
|
||||
## Failure discovery
|
||||
|
||||
- Goal: Maintain capacity
|
||||
- Failure reasons: Overheating, insufficient power, driver issues, hardware faults, ...
|
||||
- Problem: They only detected failure by detecting capacity decreasing or not being able to switch drivers
|
||||
- Fix: First detect failure, then remidiate
|
||||
- GPU Problem detector as part of their internal device plugin
|
||||
- Node Problem detector -> triggers remediation through maintainance
|
||||
|
||||
## Remidiation approaches
|
||||
|
||||
- Reboot: Works every time, but has workload related downsides -> Legit solutiom, but drain can take very long
|
||||
- Discovery of remidiation loops -> Too many reboots indicate something being not quite right
|
||||
- Optimized drain: Prioritize draining of nodes with failed devices before other maintainance
|
||||
- The current workflow is: Reboot (automated) -> Power cycle (automated) -> Rebuild Node (automated) -> Manual intervention / RMA
|
||||
|
||||
## Prevention
|
||||
|
||||
> Problems should not affect workload
|
||||
|
||||
- Healthchecks with alerts
|
||||
- Firmware & Driver updates
|
||||
- Thermal & Powermanagement
|
||||
|
||||
## Future Challenges
|
||||
|
||||
- What if a high density with 8 GPUs has one failure?
|
||||
- What is an acceptable rate of working to broken GPUs per Node
|
||||
- If there is a problematic node that has to be rebooted every couple of days should the scheduler avoid thus node?
|
||||
|
||||
## Q&A
|
||||
|
||||
- Are there any plans to opensource the gpu problem detection: We could certainly do it, not on the roadmap r/n
|
||||
- Are the failure rates representative and what is counted as failure:
|
||||
- Failure is not being able to run a workload on a node (could be hardware or driver failure)
|
||||
- The failure rate is 0,6% but the affected capacity is 1,2% (with 2 GPUs per node)
|
||||
64
content/day1/05_ressource-submission-bookkeeping.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
title: "Reliable k8s resource Submission & Bookkeeping"
|
||||
weight: 5
|
||||
tags:
|
||||
- kubecon
|
||||
- platform
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/NCkHrvqFMl8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/0d/Reliable%20K8S%20Resource%20Submission%20and%20Bookkeeping.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## Service offerings
|
||||
|
||||
- Product: HA Container Platform for general utility with a focus on run-to-complete
|
||||
- Use-Cases: ML Orchestration, CI/CD, Machine maintainace, Financial analysis, Data Processing pipeline
|
||||
- Requirements: Observability, Scheduling Events, Approval process, Bookkeeping, Datacenter Reseliency
|
||||
- Focus: Resiliency (HA with datacenter failover)
|
||||
- What the user needs: Workflow (e.g. generate report, persist report, notify)
|
||||
- What we need for the user: ConfigMaps + Secrets, Workflow templates for the steps
|
||||
|
||||
## Challenges
|
||||
|
||||
- Read after modify across multiople datacenters
|
||||
- Many reads against kubeapi that could overload the apiserver
|
||||
- No native approval flows and limited audit
|
||||
|
||||
## Submission flows from a users perspective
|
||||
|
||||
### Submission of runnables
|
||||
|
||||
- User: Submits runnable to subnitter with audit
|
||||
- Submitter: Handels retry, verification, ...
|
||||
- Submitter: Configures workload on workload clusters
|
||||
|
||||

|
||||
|
||||
### Submission of deployables
|
||||
|
||||
- User: deploys mutation to audit/sourceoftrough
|
||||
- Syncer: Syncs deployables to workload clusters
|
||||
|
||||

|
||||
|
||||
## Reporting
|
||||
|
||||
- User wants: UI with latest status for all jobs
|
||||
- Compliance wants: Transactions on given resource for auditing
|
||||
- Implementation: Highly available inventory as single source of truth
|
||||
|
||||
```mermaid
|
||||
graph
|
||||
WorkflowAPI-->|reads|inventory
|
||||
Consumer-->|updates|inventory
|
||||
Producer-->|publishes events to|Consumer
|
||||
```
|
||||
|
||||
### Potential Problems
|
||||
|
||||
- Problem: Delete event does not get propagated from syncer to producer leading to zombie ressources
|
||||
- Fix: Periodic Cleanup
|
||||
|
||||
### Overview
|
||||
|
||||

|
||||
BIN
content/day1/_img/capi.png
Normal file
|
After Width: | Height: | Size: 75 KiB |
BIN
content/day1/_img/clusterapi-crd.png
Normal file
|
After Width: | Height: | Size: 112 KiB |
BIN
content/day1/_img/deployables.png
Normal file
|
After Width: | Height: | Size: 220 KiB |
BIN
content/day1/_img/runnables.png
Normal file
|
After Width: | Height: | Size: 266 KiB |
BIN
content/day1/_img/submission.png
Normal file
|
After Width: | Height: | Size: 297 KiB |
@@ -4,8 +4,26 @@ title: Day 1
|
||||
weight: 5
|
||||
---
|
||||
|
||||
TODO:
|
||||
Day 1 of the main KubeCon event startet with a bunch of keynotes from the cncf themselfes (anouncing the next locations for kubecon - amsterdam and barcelona).
|
||||
The also announced a new sovereign cloud edge initiative (CNCF/LF meets EU and soem german ministry) called "NeoNephos" with members like SAP, StackIt or T-Systems.
|
||||
|
||||
This is also the day the sponsor showcase opened - so expect more talking to people and meetings or demos and less straight up talks.
|
||||
|
||||
## Talk recommendations
|
||||
|
||||
* TODO:
|
||||
- Not that much about gpus with good control plane scaling advice: [Scaling GPU Clusters without melting down](./01_scaling-gpu)
|
||||
- Migrate a cluster to ClusterAPI without downtime: [Day 2000 - Migrating from kubeadm + ansible to clusterapi+talos](./02_migrations)
|
||||
- Some basic operator tips with good Q&A questions: [Don't write controllers like charlie don't does: Avoiding common kubernetes controller mistakes](./03_operator-mistakes)
|
||||
|
||||
## Other stuff I learned or people i talk to
|
||||
|
||||
- The crossplane maintainers (Upbound)
|
||||
- Anynines
|
||||
- Cloudfoundry/Korifi
|
||||
- FlatCar
|
||||
- Cert-Manager
|
||||
- Flux maintainers
|
||||
- OVH
|
||||
- Kubermatic
|
||||
- Isovalent
|
||||
- Spacelift: They employ some of the opentofu core maintainers
|
||||
38
content/day2/01_chance-of-kubernetes.md
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: "Cloudy with a chance of kubernetes"
|
||||
weight: 1
|
||||
tags:
|
||||
- kubecon
|
||||
- platform
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/iCAFXF5ECto" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/bc/KubeCon%20EU%202025%20-%20Cloudy%20with%20a%20chance%20of%20Kubernetes_%20Going%20from%20one%20to%20three%20cloud%20providers%20-%20Laurent%20Bernaille%20%26%20Maxime%20Visonneau,%20Datadog.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## Background
|
||||
|
||||
- Scale: 100s of clusters
|
||||
- Cloud: Azure, AWS, GCP
|
||||
- The baseline: Single AWS Region and applications on vms
|
||||
- Goal: Operate on different locations
|
||||
- History: They added more and more regions - 6 Providers in 6 Regions across 29 locations
|
||||
- Problem: Different tooling across different cloud providers
|
||||
- Idea: Kubernetes abstracts the specific cloud provider infra
|
||||
|
||||
## The way
|
||||
|
||||
- Idea: Use managed kubernetes
|
||||
- Problem: In 2018 the managed offerings were in beta or very limited
|
||||
- Challenge: Opinionated cloud specific stuff
|
||||
|
||||
### Iterations
|
||||
|
||||
1. Clusters based on vms created by terraform and other automation tools -> They realized that they need multiple clusters per region
|
||||
2. Their own application delivery platform that deployed to the right clusters across regions for better DevEx
|
||||
3. k8s on k8s (hosted cp) -> Current setup with a terraform managed parent cluster
|
||||
4. Idea: Host the Partent-Cluster on managed kubernetes -> They need to abstract some things away
|
||||
5. Solution: Use their good old aplication delivery platform
|
||||
|
||||
### Abstractions
|
||||
|
||||
- Use custom CRDs to abstract the same behaviour across providers
|
||||
@@ -4,8 +4,21 @@ title: Day 2
|
||||
weight: 6
|
||||
---
|
||||
|
||||
TODO:
|
||||
The second day of kubecon was my main "meeting day" this year - aka there were a bunch of scheduled meetings with manufacturers, partners, potential partners or just to get to know someone/a project.
|
||||
What does this mean for you? Another day with only a few sessions (I only managed to attend two and only one was worthy of note taking) - the meeting notes are not available online.
|
||||
|
||||
## Talk recommendations
|
||||
In the evening we attended the "German Community Stammtisch".
|
||||
|
||||
* TODO:
|
||||
## Other stuff I learned or people i talk to
|
||||
|
||||
- Isovalent
|
||||
- Kubermatic
|
||||
- Portworx
|
||||
- Fastly
|
||||
- Syseleven
|
||||
- Netbird
|
||||
- VMware
|
||||
- Stackit
|
||||
- Harness
|
||||
- Mia Platform
|
||||
- and many, many more...
|
||||
53
content/day3/01_day-two.md
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
title: "Surviving Day2: Picking the right tool to secure your kubernetes habitat"
|
||||
weight: 1
|
||||
tags:
|
||||
- kubecon
|
||||
- security
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/FqUPqroF-Rw" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/a1/Surviving%20Day2%20-%20Picking%20the%20Right%20Tool%20To%20Secure%20Your%20Kubernetes%20Habitat.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
Premise: The CNCF landscape includes a huuuge number (80+) of security(related) projects.
|
||||
Analogy: Animal kingdom (includes simmilar-ish animals that might do some of the same stuff but not entirely the same)
|
||||
|
||||
## Build Phase
|
||||
|
||||
- How can i scan my container for vulnerabilities? -> Well you probably mean your image
|
||||
- The image itself is just a bunch of static layerns and we kinda have to trust the layers you didn't build yourself
|
||||
- The main tool used is still trivy with some easy steps
|
||||
1. Extract layers
|
||||
2. Build FS
|
||||
3. Identify OS and Non-OS Packages
|
||||
4. Compare with vuln-db
|
||||
- The animal in our analogy: Racoon
|
||||
|
||||
## Deploy Phase
|
||||
|
||||
- Kubernetes Native: Admission Controller
|
||||
- Tool used: Kyverno (integrates as an admission controller with yaml/crd based configuration)
|
||||
1. Modify (e.g. add default resource limits)
|
||||
2. Validate (check policies)
|
||||
- The animal is actually a human: The forrest guard
|
||||
|
||||
## Start Phase
|
||||
|
||||
- Before the pod itself is running CSI, CNI and secret related processes (the once we want to look into) happen
|
||||
- Problems: Secrets have no rotation or versioning mechanism, there is no default integration for external kms
|
||||
- Project: External Secrets -> Get secrets from external kms, automaticly sync (e.g. new versions)
|
||||
- The chosen animal: Capricorn
|
||||
|
||||
## Run Phase
|
||||
|
||||
- Goal: Runtime scannning without including specialized instrumentation in each application
|
||||
- Tool: Falco utilizing eBPF to check system calls against rules
|
||||
- Idea: Detect dangerous behaviour (e.g. check for someone trying to exploit a fresh CVE)
|
||||
- The analogy: Falcon
|
||||
|
||||
## TL;DR
|
||||
|
||||
1. Scan images (trivy)
|
||||
2. Enforce best pracices (kyverno)
|
||||
3. Use an external kms (external secrets)
|
||||
4. Scan at runtime (falco)
|
||||
30
content/day3/02_open-feature.md
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
title: "Type-safe feature flagging in openfeature: Lessons learned from using feature flags at google"
|
||||
weight: 2
|
||||
tags:
|
||||
- kubecon
|
||||
- dev
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/mewXGSwDCE4" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/f6/Type-safe%20Feature%20Flagging%20in%20OpenFeature.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
## Featureflags?
|
||||
|
||||
- Idea: Change the behaviour of an application without rebuilding it
|
||||
- Goal: Control rollout, reduce risk, experiment (a/b)
|
||||
- At google: A huge number of feature flags (150k+) but that's because people forget to turn them off
|
||||
|
||||
## Where does the flag come from
|
||||
|
||||
- Lifecycle of a flag: Create, Manage, Deprecate, Delete -> But will it be created frist in code or in the service
|
||||
- Classic implementation: Just a if/else that uses a function to get the flag
|
||||
- Problem: What if the flag names missmatch between the code and flag ser -> Muliple sources of truth
|
||||
- Solution: Require use of auto-generated flag bindings (codegen from the management system) to mitigate typos, etc.
|
||||
|
||||
## OpenFeature
|
||||
|
||||
- Goal: Vendor agnostic, standardized, open source
|
||||
- Basic setup: Register provider (once per app), create a client, use client to get flags
|
||||
- CLI: Integrate into management system, keep a local manifest of all flags and generate code (generates the client)
|
||||
- Now: Just call the client's method instead of hard-coding feature flag names
|
||||
43
content/day3/03_etcd-reliability.md
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: "Don't let your kubernetes cluster go wild: Ensuring etcd reliability"
|
||||
weight: 3
|
||||
tags:
|
||||
- kubecon
|
||||
- etcd
|
||||
---
|
||||
|
||||
{{% button href="https://youtu.be/J93U9n_qxSI" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
Fair warning: This talk was very technical and pretty interesing - but don't even try to understand it if you're tired (or if it's the thrid to last session on the last day of a long conference).
|
||||
|
||||
## Baseline
|
||||
|
||||
- Standard example: Write and read KV-Data, `put(A,2) -> Get (A)`
|
||||
- Problem: Concurrency
|
||||
|
||||
TODO: Steal image from intuition of correctness
|
||||
|
||||
## Correctness
|
||||
|
||||
- Correctness: Kinda funky when it comes to time
|
||||
- Fix: Define serialization that executes parallel request one after another to bring them in an order
|
||||
|
||||
## Failures
|
||||
|
||||
- What happens is connections between etcd nodes go down -> Serving stale data
|
||||
- What happens if data corrupts -> If enough members are online, it can repair itself
|
||||
- And many more that can happen at random times -> Hard to test
|
||||
|
||||
TODO: Steal "in a concurrent world"
|
||||
|
||||
## Robustness framework
|
||||
|
||||
- Automates tests for failures
|
||||
- Includes reliable reproductions of past (seamingly random) errors
|
||||
- Currently a mixture of existing go debugging tools
|
||||
|
||||
## Future
|
||||
|
||||
- Reproduce more bugs consistently
|
||||
- Run additional consistency checks
|
||||
@@ -4,8 +4,15 @@ title: Day 3
|
||||
weight: 7
|
||||
---
|
||||
|
||||
TODO:
|
||||
The last day of KubeCon - aka the day everone leaves early.
|
||||
But not me and I had no meetings scheduled for this day -> More talks for me and notes for you.
|
||||
|
||||
This being my 7th day of the trip and 6th day of non-stop conferences took a bit of a toll on my note taking skills (expect more spelling mistakes).
|
||||
|
||||
## Talk recommendations
|
||||
|
||||
* TODO:
|
||||
- Intro to feature flags and related tips: [Type-safe feature flagging in openfeature: Lessons learned from using feature flags at google](./02_open-feature)
|
||||
|
||||
## Other stuff I learned or people i talk to
|
||||
|
||||
- TODO:
|
||||
@@ -4,4 +4,6 @@ title: Lessons Learned
|
||||
weight: 8
|
||||
---
|
||||
|
||||
Not related to any talk directly, but i can recommend this [Blog Post](https://smudge.ai/blog/ratelimit-algorithms) and [Video](https://www.youtube.com/watch?v=8QyygfIloMc&) about rate limiting.
|
||||
|
||||
TODO:
|
||||
|
||||