Compare commits

...

37 Commits

Author SHA1 Message Date
b9060af72d docs(lessons): Added ratelimit blog(video
All checks were successful
Build latest image / build-container (push) Successful in 53s
2025-05-07 08:31:56 +02:00
3afb07e4c1 chore(day-1): Added missing tag
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-05-07 08:10:27 +02:00
4becb06ad3 fix: Wrong linebreak
All checks were successful
Build latest image / build-container (push) Successful in 57s
2025-05-07 07:09:57 +02:00
0e24bf4fd6 docs: Added youtube links
Some checks failed
Build latest image / build-container (push) Failing after 50s
2025-05-07 07:07:48 +02:00
f06c486182 fix: Pin hugo version
All checks were successful
Build latest image / build-container (push) Successful in 56s
2025-04-22 14:22:09 +02:00
f71971e793 docs: Slight rewording
Some checks failed
Build latest image / build-container (push) Failing after 48s
2025-04-22 13:57:52 +02:00
a7a3817a03 docs: Added datev at index
Some checks failed
Build latest image / build-container (push) Has been cancelled
2025-04-22 13:56:02 +02:00
47f7869257 docs(day2): Added own talk
All checks were successful
Build latest image / build-container (push) Successful in 51s
2025-04-08 10:22:40 +02:00
b2fd7a4c81 fix: Update diagram to correctly reflect Flux operations
All checks were successful
Build latest image / build-container (push) Successful in 51s
2025-04-07 18:57:12 +02:00
1213be7c30 docs: Added basic changelog 2025-04-07 18:56:18 +02:00
1f49a42edc fix(docs): Added missing tags
All checks were successful
Build latest image / build-container (push) Successful in 44s
2025-04-07 18:51:03 +02:00
c6f716ced1 fix(docs): Fixed relative links
All checks were successful
Build latest image / build-container (push) Successful in 47s
2025-04-07 18:50:21 +02:00
09ac5a9051 docs: Added images 2025-04-07 18:50:12 +02:00
5ed623d0ca docs: Added slide links for kubecon/cloudnativecon 2025-04-07 18:49:57 +02:00
f8ca21416b fix(day0): Typo in name
All checks were successful
Build latest image / build-container (push) Successful in 59s
2025-04-07 10:40:05 +02:00
dc4dd2d883 fix(day3): Typo
Some checks failed
Build latest image / build-container (push) Has been cancelled
2025-04-07 10:39:37 +02:00
957bc94344 docs(day3): etcd talk
Some checks failed
Build latest image / build-container (push) Failing after 36s
2025-04-04 15:08:02 +02:00
44a3653c84 docs(day3): feature flag talk
Some checks failed
Build latest image / build-container (push) Failing after 35s
2025-04-04 13:09:17 +02:00
6bf47e49c5 docs(day3): First talk of the day 🎉
Some checks failed
Build latest image / build-container (push) Failing after 34s
2025-04-04 12:25:46 +02:00
39d92acdb4 docs(day3): Added initial notes of the day 2025-04-04 12:02:37 +02:00
4d528bf5de docs(day2): Added single talk notes
All checks were successful
Build latest image / build-container (push) Successful in 46s
2025-04-03 18:49:22 +02:00
d2f3f5f95d docs(day2): Added daily notes
All checks were successful
Build latest image / build-container (push) Successful in 48s
2025-04-03 11:07:01 +02:00
6d0c95a8ac docs(day-1): Added notes for my talk
All checks were successful
Build latest image / build-container (push) Successful in 46s
2025-04-03 11:02:19 +02:00
3e4fbb616b docs(cnrj): Added video links
All checks were successful
Build latest image / build-container (push) Successful in 53s
2025-04-03 10:59:24 +02:00
d9605d602e docs(day1): Bloomberg call
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-04-02 18:23:49 +02:00
745e8f5896 style: Formatting
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-04-02 18:01:06 +02:00
78ca5973b8 docs: Updated day notes
Some checks failed
Build latest image / build-container (push) Failing after 34s
2025-04-02 17:43:43 +02:00
77f34ed1ab docs(day1): GPU Talk 2025-04-02 17:43:21 +02:00
a36f562cf4 docs(day1): Formatted notes 2025-04-02 17:15:55 +02:00
9ad9af0f9c docs(day1): Added operator q&a
All checks were successful
Build latest image / build-container (push) Successful in 50s
2025-04-02 16:00:12 +02:00
4f39c1102c docs(day1); Operator mistakes talk
All checks were successful
Build latest image / build-container (push) Successful in 45s
2025-04-02 15:55:03 +02:00
df93624814 docs(day1): Added notes to talk
All checks were successful
Build latest image / build-container (push) Successful in 48s
2025-04-02 13:27:38 +02:00
46b06c66fd docs: Added slides button to all pages
All checks were successful
Build latest image / build-container (push) Successful in 49s
2025-04-02 13:21:27 +02:00
b4d8aa29c3 feat(tempalte): Added slides button to template 2025-04-02 13:18:35 +02:00
4cec1917bf docs(day1): Added migration talk 2025-04-02 13:17:43 +02:00
bd7d9fe87d docs(day1): First talk
All checks were successful
Build latest image / build-container (push) Successful in 47s
2025-04-02 12:44:09 +02:00
f4858d81a8 docs(day0): Last talk
All checks were successful
Build latest image / build-container (push) Successful in 47s
2025-04-01 17:34:31 +02:00
58 changed files with 698 additions and 65 deletions

View File

@@ -1,4 +1,4 @@
FROM registry.odit.services/hub/hugomods/hugo:exts AS build
FROM registry.odit.services/hub/hugomods/hugo:exts-0.145.0 AS build
WORKDIR /app
COPY . /app/

View File

@@ -6,5 +6,6 @@ tags:
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
TODO:

View File

@@ -10,11 +10,12 @@ This current version is probably full of typos - will fix later. This is what ty
## How did I get there?
I attended Cloud Native Rejekts and KubeCon + CloudNativeCon Europe 2025 in London.
This year I was sent there by my employer [DATEV eG](https://datev.de) - thanks again to everyone who helped me with getting this trip approved (you know who you are).
Why? Because learning about all new things in the world of cloud is really important and war stories help to avoid mistakes that other's already made.
And [last year's experience](https://kubecon24.nicolai-ort.com) was really good, so I wanted to go again.
Plus I actually presented a talk at Cloud Native Rejekts.
Plus I actually presented a talk at Cloud Native Rejekts 🥳.
## And how does this website get it's content
@@ -24,9 +25,22 @@ graph LR
Nicolai-->|"Takes notes (and typos) + commits"|Repo
Repo-->|Triggers|Actions
Actions-->|Builds image and pushes to|Registry
Kubernetes-->|Pulls latest image|Registry
Flux-->|Detects new image|Registry
Flux-->|Rolls out new image|Kubernetes
```
## Changelog™
- 2025-03-28: Inital repo and deployment setup
- 2025-03-30: First day of Cloud Native Rejekts
- 2025-03-31: Second day of Cloud Native Rejekts
- 2025-04-01: First day of KubeCon/CloudNativeCon
- 2025-04-02: Second day of KubeCon/CloudNativeCon
- 2025-04-03: Added video links for Cloud Native Rejekts
- 2025-04-03: Third day of KubeCon/CloudNativeCon
- 2025-04-04: Fourth day of KubeCon/CloudNativeCon
- 2025-04-07: Added missing images and slide links for KubeCon/CloudNativeCon
## Style Guide
The basic structure is as follows: `day/event-or-session`.

View File

@@ -6,7 +6,8 @@ tags:
- security
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=JAy6Ra0ulSw" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## BAseline

View File

@@ -2,10 +2,12 @@
title: "The Hidden Brains of Kubernetes: Meet Controllers Powering the Cloud"
weight: 2
tags:
- <tag>
- rejekts
- operator
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=PciVvE02L2w" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Big Picture

12
content/day-1/02_gslb.md Normal file
View File

@@ -0,0 +1,12 @@
---
title: Evaluating Global Load Balancing Options for Kubernetes in Practice
weight: 2
tags:
- rejekts
---
{{% button href="https://youtu.be/RBMRU8rtxfI" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://github.com/nicolaiort/rejekts2025-gslb" style="tip" icon="code" %}}Demo-Code and more{{% /button %}}
{{% button href="https://de.slideshare.net/slideshow/evaluating-global-load-balancing-options-for-kubernetes-in-practice-kubermatic-datev/277640385" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
My talk, notes will be released soon

View File

@@ -5,7 +5,8 @@ tags:
- rejekts
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=DdQzGsiounY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## The clans (popular solutions)

View File

@@ -2,10 +2,10 @@
title: Understanding and Debugging DNS in Kubernetes Clusters
weight: 4
tags:
- <tag>
- rejekts
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=awXjABDknww" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://github.com/mqasimsarfraz/talks/tree/main/CloudNativeRejekts-2025" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}

View File

@@ -6,7 +6,8 @@ tags:
- edge
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=jywpFlOH3z0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## The far edge

View File

@@ -6,7 +6,8 @@ tags:
- multicluster
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=w8rDxtrMGG8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Baseline Infra

View File

@@ -9,10 +9,10 @@ This was another very interesting day and I can only recommend attending cloud n
## Talk recommendations
- My Talk: [Evaluating Global Load Balancing Options for Kubernetes in Practice](todo:)
- Service Mesh Intro + Comparison: [The service mesh wars - a new hope for kubernetes](../03_service-mesh)
- How to handle evection and statefulness across clusters: [Scaling PDBs: Introducing Multi-Cluster Resilience with x-pdb](../06_scaling-pdbs)
- Intro to operators: [The Hidden Brains of Kubernetes: Meet Controllers Powering the Cloud](../02_controllers)
- My Talk: [Evaluating Global Load Balancing Options for Kubernetes in Practice](./02_gslb)
- Service Mesh Intro + Comparison: [The service mesh wars - a new hope for kubernetes](./03_service-mesh)
- How to handle evection and statefulness across clusters: [Scaling PDBs: Introducing Multi-Cluster Resilience with x-pdb](./06_scaling-pdbs)
- Intro to operators: [The Hidden Brains of Kubernetes: Meet Controllers Powering the Cloud](./02_controllers)
## Other stuff I learned or people i talk to

View File

@@ -7,5 +7,6 @@ tags:
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Short opening keynote thanking volunteers and attendees.

View File

@@ -8,7 +8,8 @@ tags:
- multicluster
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=r0W6cCJAGro" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
The talk started with a base introduction of ClusterAPI and the operations at gigantswarm.

View File

@@ -6,7 +6,8 @@ tags:
- keynote
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=m9NRk-6MSvY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
A short keynote from micrososft about their contributions to open source and used tools:
- infra (kubernates, istio, hyperlight)

View File

@@ -6,7 +6,8 @@ tags:
- multicluster
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=e1BmT0jc_Fs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Background

View File

@@ -5,7 +5,8 @@ tags:
- rejekts
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=CAPtQnH4rPY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Recruitment & Staffing

View File

@@ -5,7 +5,8 @@ tags:
- rejekts
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=qNShvqSTKCU" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Background: The state of cloud in mauritius

View File

@@ -6,7 +6,8 @@ tags:
- performance
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=EYipC5y-8rM" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
There were more details in the talk than I copied into these notes.
Most of them were just too much to write down or application specific.

View File

@@ -6,7 +6,8 @@ tags:
- crossplane
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=D4bKe4rAasc" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Joint effort of novo-nordik and upbound.

View File

@@ -6,7 +6,8 @@ tags:
- security
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=rJacyDygVi0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Why does e2e authenticity matter?

View File

@@ -5,7 +5,8 @@ tags:
- rejekts
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.youtube.com/watch?v=1US_-3udMDo" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Hypothesis

View File

@@ -10,12 +10,12 @@ This is the first day of Cloud Native Rejekts and the first time of me attending
> Ranked by should watch to could watch
- How to hire, manage and develop engineers: [Tech is broken and AI won't fix it](../05_broken-tech)
- What if my homelab is an african island: [Geographically Distributed Clusters: Resilient Distributed Compute on the Edge](../06_geo-distributed-clusters)
- Bootstrap and CI/CD with crossplane: [Building air-gapped control planes for a global pharma leader using crossplane and argo](../08_airgapped-cp)
- Handling large number of clusters: [CRD Data Architecture for Multi-Cluster Kubernetes](../04_multicluster-crd)
- Handling large scale migrations: [The Cluster API Migration Retrospective: Live migrating hundreds of clusters to Cluster API](../02_clusterapi)
- How to hire, manage and develop engineers: [Tech is broken and AI won't fix it](./05_broken-tech)
- What if my homelab is an african island: [Geographically Distributed Clusters: Resilient Distributed Compute on the Edge](./06_geo-distributed-clusters)
- Bootstrap and CI/CD with crossplane: [Building air-gapped control planes for a global pharma leader using crossplane and argo](./08_airgapped-cp)
- Handling large number of clusters: [CRD Data Architecture for Multi-Cluster Kubernetes](./04_multicluster-crd)
- Handling large scale migrations: [The Cluster API Migration Retrospective: Live migrating hundreds of clusters to Cluster API](./02_clusterapi)
## Other stuff I learned or people i talk to
- Throughout the lunch break I talked to a nice guy who heared my government question during the [Tech is broken and AI won't fix it](../05_broken-tech)-Talk, we talked
- Throughout the lunch break I talked to a nice guy who heared my government question during the [Tech is broken and AI won't fix it](./05_broken-tech)-Talk, we talked

View File

@@ -7,6 +7,7 @@ tags:
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/70/Platforms%20WG%20Update%20slides%20-%20Kubecon%20EU%202025.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
An update from the platform working group which will be renamed to the CNCF Platform Engineering Community.
Alongside the new name a bit of restructuring will take place bacause the working group outgrew the working group label.

View File

@@ -7,7 +7,8 @@ tags:
- sponsored
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/7tbs3J7mgE0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## States of platform

View File

@@ -7,7 +7,8 @@ tags:
- sponsored
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/MFLXFNlmMMI" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
This whole talk is pretty much a product managers view on platform engieering.

View File

@@ -8,6 +8,7 @@ tags:
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Conviciton

View File

@@ -7,8 +7,7 @@ tags:
- sponsored
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/XrMsJIL35Oc" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Hypothesis: We are at the beginning of a 10 year cycle that is moving towards ai-native applications.

View File

@@ -6,7 +6,8 @@ tags:
- cloudnativecon
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/cl-MO7j7MHY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Hypothesis: The bar for good interviewing is somewhere near the earth's core and we need to improve this (because we need more engineers)

View File

@@ -4,10 +4,11 @@ weight: 7
tags:
- platform
- cloudnativecon
- victor
- viktor
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/uwDoHm-AxTM" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
The good old baseline is "iam an an developer, i write code - now i have to do stuff to continue writing code".
Most developers will continue on to "now i have to write scripts" on order to just do their jobs instead of working on infra.

View File

@@ -6,7 +6,8 @@ tags:
- cloudnativecon
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/8_pB9RAfzrY" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/48/Product%20Thinking%20for%20Cloud%20Native%20Engineers%20PlatformEngineeringDay-EU-25.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## How & Why
@@ -28,7 +29,7 @@ tags:
## How to start?
TODO: Steal illustration
![Product compass illustration](../_img/product-compass.png)
### Exploring the Problem Space

View File

@@ -4,10 +4,10 @@ weight: 9
tags:
- argo
- cloudnativecon
- victor
- viktor
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/iCTgRC3AQQk" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Baseline

View File

@@ -6,7 +6,8 @@ tags:
- cloudnativecon
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/M5X5NCzlzIA" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/52/atul-talk-platform-engineering-kubecon-london-2025_final.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
Fair warning: Food analogies incoming
@@ -44,7 +45,7 @@ Fair warning: Food analogies incoming
4. Add complexity
5. Repeat
TODO: Steal image
![Abstraction cycle illustration](../_img/abstraction-cycle.png)
### Warning signs

View File

@@ -6,7 +6,9 @@ tags:
- cloudnativecon
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/qXRHpIYxU_c" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/da/KubeCon%20Talk_%20Lemonade%27s%20t-env.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
Okteto: Ephemeral environents for testing
## History

View File

@@ -1,12 +1,13 @@
---
title: "PErfomance preseverance: Taming 1000 kubernetes clusters"
title: "Perfomance preseverance: Taming 1000 kubernetes clusters"
weight: 12
tags:
- platform
- cloudnativecon
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/ZTT8M74RD1M" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/d5/kubecon_2025_v4.2.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## History

View File

@@ -6,7 +6,8 @@ tags:
- cloudnativecon
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/DoiaHfl9Y7Y" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
The CNCF's research into product thinking for platforms.

View File

@@ -6,7 +6,8 @@ tags:
- cloudnativecon
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://youtu.be/8FmJWd7vRt4" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Very nice kids playing with lego intro analogy about creativity, sharing and colaboration.

View File

@@ -0,0 +1,29 @@
---
title: 10 Quick tips on how to internally market your platform
weight: 15
tags:
- platform
- cloudnativecon
- lightning
---
{{% button href="https://youtu.be/kiUV8En8Co4" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/colocatedeventseu2025/42/2025-PE-Day-10-Tips.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Baseline
- Event great tech does not sell itself - you need marketing
- We don't have a big marketing budget for our internal platform
- No adoption -> No Trust -> No new users -> No adoption
## Tips
- Define personas and a value proposition map
- Build a brand: Name, logo, story, swag
- Have a launch party or milestone parties
- Provide clear accesible communication (with clear channels, docs, ...)
- Build a commmunity that can help each other (and don't seperate yourself from the community)
- Capture metrics for success for yourself and from a user's perspective
- Provide a 5minute Wow-Moment/demo werhe the user can feel like they achived something
- Level up with gamification
- Leverage external events for internal visibility

Binary file not shown.

After

Width:  |  Height:  |  Size: 572 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 270 KiB

View File

@@ -16,12 +16,12 @@ Sometimes we ended up in the same talks, sometimes in different talks which lead
## Talk recommendations
- How to design a good hireing process: [So you want to hire for platform engineering](../06_hire-engineers)
- Evolution of Platforms and Platform Engineering: [The past, the present and the future of platform engineering](../07_past-present-future)
- How to design a good product: [Product thinking for cloud native engineers](../08_product-thinking)
- Staging with gitops: [A million ways to promote changes between environments](../09_promotions)
- How to handle abstractions and new requriements: [Platform abstractions: Asset or liability](../10_abstractions)
- Very nice slides: [Building Platforms with empathy and yaml at the lego group](../14_lego)
- How to design a good hireing process: [So you want to hire for platform engineering](./06_hire-engineers)
- Evolution of Platforms and Platform Engineering: [The past, the present and the future of platform engineering](./07_past-present-future)
- How to design a good product: [Product thinking for cloud native engineers](./08_product-thinking)
- Staging with gitops: [A million ways to promote changes between environments](./09_promotions)
- How to handle abstractions and new requriements: [Platform abstractions: Asset or liability](./10_abstractions)
- Very nice slides: [Building Platforms with empathy and yaml at the lego group](./14_lego)
## Other stuff I learned or people i talk to

View File

@@ -0,0 +1,77 @@
---
title: Scaling GPU Clusters without melting down
weight: 1
tags:
- ml
- nvidia
- ai
- apiserver
- go
- kubecon
---
{{% button href="https://youtu.be/dUfp3j1j-mg" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/50/Scaling%20GPU%20Clusters%20Without%20Melting%20Down%21%20%281%29.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Baseline
- We need mroe and more gpus -> Control Plane needs to keep track of more objects
- Goal: Scale Workers without scaling control plane
## Current Problems
### Secret list calls go up and control plane goes down
- Scenario: High number of list calls with larger secrets
- Problem: OOM apiserver b/c cache
- Fix: API Priority & Fairness (only allow two concurrent list calls, queue the rest)
- Result: Decreased number of oom crashes
### High memory usage until we restart the apiserver
- Scenario: API-Server frees up to 40% of it's memory util when restarted
- Main suspect: Memory collection
- Idea: Tune GOGC (ENV Var `GOCC`) -> They set the default 100 to 50
- Result: Decrease in memory util and no more growing util over time
### Large skew in memory utilization
- Scanario: Scew between api server memory utilization across api-server pods
- Problem: If a pod with high util get's hist with a list, the api-server will oom -> The LB redirects to the other 2 -> Those OOM
- Observation: The lb in fron of the api server pods also shows some skew -> Explains the skew
- Root cause: lb has long living tcp connections to the servers and balances based on connections and not requests
- Idea: Switch up the lb configuration -> Not quite the right angle
- Fix: Goaway-chance param in apiserver - random `COAWAY TCP` message get's sent -> Tearing down connection gracefully, recreate connection
### Architectural mistakes
- Large number of secrets per workload -> List, Encode/Decode overhead
- No caching -> To many list calls
### Preview
- There are a bunch of sig api-machinery improvements planned
## The future
- The switch from NUMA GPU-Devices to DRA
- DRA is powerfull engough to get rid of custom numa stuff
### The stack
- Currently:
- CP: APIServer, Controller manager, Scheduler and Topology aware scheduler
- Worker: Device Plugin, nfd topology updater
- Future
- CP: APIServer, Controller manager, Scheduler
- Worker: Device Plugin
### Testing scaling
- Tool: KWOK (Kubernetes WithOut Kublet) - used to simulate gpu workout
- Env: K8S 1.32 with scaling from 0 to 4000 Workloads
- Metrics:
- Scheduling Latency: Topo aware was way more latency-affected
- Scheduler Memory util: 30% of memory saved with dra
- APi-Server Memory: Another 20& of memory saved
- Result: They are confident that DRA will bew stable and even save memeory and cpu util

View File

@@ -0,0 +1,81 @@
---
title: Day 2000 - Migrating from kubeadm + ansible to clusterapi+talos
weight: 2
tags:
- kubecon
- platform
---
{{% button href="https://youtu.be/uQ_WN1kuDo0" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/fd/day2000-migration-ClusterAPI-talos.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Background
- They use large, shared clusters
- The oldest cluster is 2099 days (5,8 years) old
- Onprem hosted on vSphere with vanilla kubeadm
- Fun fact: They run chaosmonkey on all clusters -> Automaticly prepares for updates
### Legacy provisioning
1. Terraform create debian vm
2. Deploy base tools with puppet
3. Register nodes in inventory yaml file
4. run ansible playbook -> Renders configs and runs kubeadm
5. Configure ArgoCD
### Target
- Use Clusterapi to manage the workload-clusters
- Basic CRDS: Cluster, MachineDeployment, Machine
- Talos: Immutable, minimal, ephemeral with declarative config via grpc api
![CAPI Diagram](../_img/capi.png)
## Migration
1. Config matching between kubeadm and talos+capi
2. Import PKI/Certs
3. Create ClusterAPI CRDs
4. Add ClusterAPI Nodes
5. Remove kubeadm nodes
### 1. Config matching
1. Serviceaccount Issuer: Talos has it's own default
2. etcd encryption key names are hardcoded in talos
3. Re-Encrypt all secrets (get secrets, replace secrets)
### 2. PKI
1. Talos includes some logic that can generate a secrets bundle from an existing API
2. Import: The etcd, k8s, serviceaccount and os (talos specific, used for the talos api auth) certificates
### 3. CRDs
- One namespace per workload cluster
- Cluster-CRD: Ref to CP and Infrastructure
- ControlPlane-CRD: Create cp MDs
- Infrastructure: References template for wokrer-MDs
![ClusterAPI CRDs](../_img/clusterapi-crd.png)
### 4. Add ClusterAPI Nodes
- Add new CP and Worker Nodes to the cluster that are managed by CAPI (slowly, stuff will break)
- Remove the old nodes one by one over weeks ore months
- Potential Problems:
- Mismatched serviceaccountissuer
- Missing etcd encryption key
- Wrong etcd encryption key
- Loss of quorum: `--force-new-cluster` can force recovery on one node of the etcd cluster
## Demo
I reccomend watching the demo
Talos seems pretty cool.
## Bootstrapping
- Kind cluster in github action or on local device

View File

@@ -0,0 +1,79 @@
---
title: "Don't write controllers like charlie don't does: Avoiding common kubernetes controller mistakes"
weight: 3
tags:
- kubecon
- operator
---
{{% button href="https://youtu.be/tnSraS9JqZ8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/53/Don%27t%20write%20controllers%20like%20Charlie%20Don%27t%20does_%20avoiding%20common%20Kubernetes%20controller%20mistakes.pptx.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Common mistake
### Not using a simple client but directly talk to the api server
- Problem: A
- Problem: Updates send in the whole object -> Noop updates waste apiserver resources
- Fix: Use a cache client
- Problem: Caching validation
### Don't use custom caching
- Problem: Good Luck dealing with concurrency
- Hard: Controllers mus maintain a per kind cache
- Problem: Eventual consistency makes everything more complicated
- Fix: Use a framework
### Predecates only apply to the current
- If you have a predecate in the for (predecate) only appy to this call, not to other watchers
- Also check if you shold be reconciling your low-level object or reconciling the higher level ones that ref to them is better
## Tools
### KRT
> Still under development
- Operatorions in collections (kubernetes objects with state tracking)
- Fetch function that handels transformation
### StateDB
- In-memory database for go with watch channels
- You can setup a table that stores all objects of a kind (provided by the client)
- Triggers hooks when changes happen in the database that you can react to
### Controller-Runtime
> The kubebuilder one
- Includes a chached client
- Works on the reconciler pattern -> Makes triggers simpe
## Tips
- Limit the number of api server updates
- Check for dif yourself and don't send updates if there is nothing new
- Use patch instead of update just with changed fields -> Especially for `.status`
- Use a framework that handles watching, coalescing and caching (krt, statedb, controller-runtime)
- Use predecates if you're using controller-runtime, this helps you filter out no-op events by checking them against the cache and filters
## Q&A
- Do you know where your reconciliations are coming from:
- Counts: Yes the frameworks provide metrics and you can implement your own
- But controller runtime abstracts the patch source so you have to compare before and after state yourself - but you should not do that
- What about state sharing across multiple threads?
- Controller runtime handels each reconcile as idempotent, so you can just multithread
- But handling consistency can still be hard because you have to design all of your operations as idempotent by rebuilding the state each time
- What are your thoughts on controllers that do stuff in the real world (especially b/c it takes longer and there are no natie observers)
- Do something like the krt project by keeping the state seperatly
- What if someone changes things at the cloud provider
- A question of philosophy -> Usually just treat the operator at the source of throuth
- How do you test your operators?
- Depends on your output (kubernetes objects make stuf simple)
- For cilium: Simple b/c it's just creating kubernetes projects
- With oputside interaction: In-memory state representation or mocking
- For complex controllers split the operator into: Ingestion, data model and transformation

View File

@@ -0,0 +1,56 @@
---
title: The GPUs on the bus go round and round
weight: 4
tags:
- kubecon
- gpu
- nvidia
---
{{% button href="https://youtu.be/cLJRh4y4vXg" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Background
- They are the GForce Now folks
- Large fleet of clusters all over the world (60.000+ GPUs)
- They use kubevirt to pass through GPUs (vfio driver) or vGPUs
- Devices fail from time to time
- Sometimes failures needs restarts
## Failure discovery
- Goal: Maintain capacity
- Failure reasons: Overheating, insufficient power, driver issues, hardware faults, ...
- Problem: They only detected failure by detecting capacity decreasing or not being able to switch drivers
- Fix: First detect failure, then remidiate
- GPU Problem detector as part of their internal device plugin
- Node Problem detector -> triggers remediation through maintainance
## Remidiation approaches
- Reboot: Works every time, but has workload related downsides -> Legit solutiom, but drain can take very long
- Discovery of remidiation loops -> Too many reboots indicate something being not quite right
- Optimized drain: Prioritize draining of nodes with failed devices before other maintainance
- The current workflow is: Reboot (automated) -> Power cycle (automated) -> Rebuild Node (automated) -> Manual intervention / RMA
## Prevention
> Problems should not affect workload
- Healthchecks with alerts
- Firmware & Driver updates
- Thermal & Powermanagement
## Future Challenges
- What if a high density with 8 GPUs has one failure?
- What is an acceptable rate of working to broken GPUs per Node
- If there is a problematic node that has to be rebooted every couple of days should the scheduler avoid thus node?
## Q&A
- Are there any plans to opensource the gpu problem detection: We could certainly do it, not on the roadmap r/n
- Are the failure rates representative and what is counted as failure:
- Failure is not being able to run a workload on a node (could be hardware or driver failure)
- The failure rate is 0,6% but the affected capacity is 1,2% (with 2 GPUs per node)

View File

@@ -0,0 +1,64 @@
---
title: "Reliable k8s resource Submission & Bookkeeping"
weight: 5
tags:
- kubecon
- platform
---
{{% button href="https://youtu.be/NCkHrvqFMl8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/0d/Reliable%20K8S%20Resource%20Submission%20and%20Bookkeeping.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Service offerings
- Product: HA Container Platform for general utility with a focus on run-to-complete
- Use-Cases: ML Orchestration, CI/CD, Machine maintainace, Financial analysis, Data Processing pipeline
- Requirements: Observability, Scheduling Events, Approval process, Bookkeeping, Datacenter Reseliency
- Focus: Resiliency (HA with datacenter failover)
- What the user needs: Workflow (e.g. generate report, persist report, notify)
- What we need for the user: ConfigMaps + Secrets, Workflow templates for the steps
## Challenges
- Read after modify across multiople datacenters
- Many reads against kubeapi that could overload the apiserver
- No native approval flows and limited audit
## Submission flows from a users perspective
### Submission of runnables
- User: Submits runnable to subnitter with audit
- Submitter: Handels retry, verification, ...
- Submitter: Configures workload on workload clusters
![](../_img/runnables.png)
### Submission of deployables
- User: deploys mutation to audit/sourceoftrough
- Syncer: Syncs deployables to workload clusters
![](../_img/deployables.png)
## Reporting
- User wants: UI with latest status for all jobs
- Compliance wants: Transactions on given resource for auditing
- Implementation: Highly available inventory as single source of truth
```mermaid
graph
WorkflowAPI-->|reads|inventory
Consumer-->|updates|inventory
Producer-->|publishes events to|Consumer
```
### Potential Problems
- Problem: Delete event does not get propagated from syncer to producer leading to zombie ressources
- Fix: Periodic Cleanup
### Overview
![Complete diagram](../_img/submission.png)

BIN
content/day1/_img/capi.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 220 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 266 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 297 KiB

View File

@@ -4,12 +4,26 @@ title: Day 1
weight: 5
---
TODO:
Day 1 of the main KubeCon event startet with a bunch of keynotes from the cncf themselfes (anouncing the next locations for kubecon - amsterdam and barcelona).
The also announced a new sovereign cloud edge initiative (CNCF/LF meets EU and soem german ministry) called "NeoNephos" with members like SAP, StackIt or T-Systems.
This is also the day the sponsor showcase opened - so expect more talking to people and meetings or demos and less straight up talks.
## Talk recommendations
- TODO:
- Not that much about gpus with good control plane scaling advice: [Scaling GPU Clusters without melting down](./01_scaling-gpu)
- Migrate a cluster to ClusterAPI without downtime: [Day 2000 - Migrating from kubeadm + ansible to clusterapi+talos](./02_migrations)
- Some basic operator tips with good Q&A questions: [Don't write controllers like charlie don't does: Avoiding common kubernetes controller mistakes](./03_operator-mistakes)
## Other stuff I learned or people i talk to
- TODO:
- The crossplane maintainers (Upbound)
- Anynines
- Cloudfoundry/Korifi
- FlatCar
- Cert-Manager
- Flux maintainers
- OVH
- Kubermatic
- Isovalent
- Spacelift: They employ some of the opentofu core maintainers

View File

@@ -0,0 +1,38 @@
---
title: "Cloudy with a chance of kubernetes"
weight: 1
tags:
- kubecon
- platform
---
{{% button href="https://youtu.be/iCAFXF5ECto" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/bc/KubeCon%20EU%202025%20-%20Cloudy%20with%20a%20chance%20of%20Kubernetes_%20Going%20from%20one%20to%20three%20cloud%20providers%20-%20Laurent%20Bernaille%20%26%20Maxime%20Visonneau,%20Datadog.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Background
- Scale: 100s of clusters
- Cloud: Azure, AWS, GCP
- The baseline: Single AWS Region and applications on vms
- Goal: Operate on different locations
- History: They added more and more regions - 6 Providers in 6 Regions across 29 locations
- Problem: Different tooling across different cloud providers
- Idea: Kubernetes abstracts the specific cloud provider infra
## The way
- Idea: Use managed kubernetes
- Problem: In 2018 the managed offerings were in beta or very limited
- Challenge: Opinionated cloud specific stuff
### Iterations
1. Clusters based on vms created by terraform and other automation tools -> They realized that they need multiple clusters per region
2. Their own application delivery platform that deployed to the right clusters across regions for better DevEx
3. k8s on k8s (hosted cp) -> Current setup with a terraform managed parent cluster
4. Idea: Host the Partent-Cluster on managed kubernetes -> They need to abstract some things away
5. Solution: Use their good old aplication delivery platform
### Abstractions
- Use custom CRDs to abstract the same behaviour across providers

View File

@@ -4,12 +4,21 @@ title: Day 2
weight: 6
---
TODO:
The second day of kubecon was my main "meeting day" this year - aka there were a bunch of scheduled meetings with manufacturers, partners, potential partners or just to get to know someone/a project.
What does this mean for you? Another day with only a few sessions (I only managed to attend two and only one was worthy of note taking) - the meeting notes are not available online.
## Talk recommendations
- TODO:
In the evening we attended the "German Community Stammtisch".
## Other stuff I learned or people i talk to
- TODO:
- Isovalent
- Kubermatic
- Portworx
- Fastly
- Syseleven
- Netbird
- VMware
- Stackit
- Harness
- Mia Platform
- and many, many more...

View File

@@ -0,0 +1,53 @@
---
title: "Surviving Day2: Picking the right tool to secure your kubernetes habitat"
weight: 1
tags:
- kubecon
- security
---
{{% button href="https://youtu.be/FqUPqroF-Rw" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/a1/Surviving%20Day2%20-%20Picking%20the%20Right%20Tool%20To%20Secure%20Your%20Kubernetes%20Habitat.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
Premise: The CNCF landscape includes a huuuge number (80+) of security(related) projects.
Analogy: Animal kingdom (includes simmilar-ish animals that might do some of the same stuff but not entirely the same)
## Build Phase
- How can i scan my container for vulnerabilities? -> Well you probably mean your image
- The image itself is just a bunch of static layerns and we kinda have to trust the layers you didn't build yourself
- The main tool used is still trivy with some easy steps
1. Extract layers
2. Build FS
3. Identify OS and Non-OS Packages
4. Compare with vuln-db
- The animal in our analogy: Racoon
## Deploy Phase
- Kubernetes Native: Admission Controller
- Tool used: Kyverno (integrates as an admission controller with yaml/crd based configuration)
1. Modify (e.g. add default resource limits)
2. Validate (check policies)
- The animal is actually a human: The forrest guard
## Start Phase
- Before the pod itself is running CSI, CNI and secret related processes (the once we want to look into) happen
- Problems: Secrets have no rotation or versioning mechanism, there is no default integration for external kms
- Project: External Secrets -> Get secrets from external kms, automaticly sync (e.g. new versions)
- The chosen animal: Capricorn
## Run Phase
- Goal: Runtime scannning without including specialized instrumentation in each application
- Tool: Falco utilizing eBPF to check system calls against rules
- Idea: Detect dangerous behaviour (e.g. check for someone trying to exploit a fresh CVE)
- The analogy: Falcon
## TL;DR
1. Scan images (trivy)
2. Enforce best pracices (kyverno)
3. Use an external kms (external secrets)
4. Scan at runtime (falco)

View File

@@ -0,0 +1,30 @@
---
title: "Type-safe feature flagging in openfeature: Lessons learned from using feature flags at google"
weight: 2
tags:
- kubecon
- dev
---
{{% button href="https://youtu.be/mewXGSwDCE4" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
{{% button href="https://static.sched.com/hosted_files/kccnceu2025/f6/Type-safe%20Feature%20Flagging%20in%20OpenFeature.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
## Featureflags?
- Idea: Change the behaviour of an application without rebuilding it
- Goal: Control rollout, reduce risk, experiment (a/b)
- At google: A huge number of feature flags (150k+) but that's because people forget to turn them off
## Where does the flag come from
- Lifecycle of a flag: Create, Manage, Deprecate, Delete -> But will it be created frist in code or in the service
- Classic implementation: Just a if/else that uses a function to get the flag
- Problem: What if the flag names missmatch between the code and flag ser -> Muliple sources of truth
- Solution: Require use of auto-generated flag bindings (codegen from the management system) to mitigate typos, etc.
## OpenFeature
- Goal: Vendor agnostic, standardized, open source
- Basic setup: Register provider (once per app), create a client, use client to get flags
- CLI: Integrate into management system, keep a local manifest of all flags and generate code (generates the client)
- Now: Just call the client's method instead of hard-coding feature flag names

View File

@@ -0,0 +1,43 @@
---
title: "Don't let your kubernetes cluster go wild: Ensuring etcd reliability"
weight: 3
tags:
- kubecon
- etcd
---
{{% button href="https://youtu.be/J93U9n_qxSI" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Fair warning: This talk was very technical and pretty interesing - but don't even try to understand it if you're tired (or if it's the thrid to last session on the last day of a long conference).
## Baseline
- Standard example: Write and read KV-Data, `put(A,2) -> Get (A)`
- Problem: Concurrency
TODO: Steal image from intuition of correctness
## Correctness
- Correctness: Kinda funky when it comes to time
- Fix: Define serialization that executes parallel request one after another to bring them in an order
## Failures
- What happens is connections between etcd nodes go down -> Serving stale data
- What happens if data corrupts -> If enough members are online, it can repair itself
- And many more that can happen at random times -> Hard to test
TODO: Steal "in a concurrent world"
## Robustness framework
- Automates tests for failures
- Includes reliable reproductions of past (seamingly random) errors
- Currently a mixture of existing go debugging tools
## Future
- Reproduce more bugs consistently
- Run additional consistency checks

View File

@@ -4,11 +4,14 @@ title: Day 3
weight: 7
---
TODO:
The last day of KubeCon - aka the day everone leaves early.
But not me and I had no meetings scheduled for this day -> More talks for me and notes for you.
This being my 7th day of the trip and 6th day of non-stop conferences took a bit of a toll on my note taking skills (expect more spelling mistakes).
## Talk recommendations
- TODO:
- Intro to feature flags and related tips: [Type-safe feature flagging in openfeature: Lessons learned from using feature flags at google](./02_open-feature)
## Other stuff I learned or people i talk to

View File

@@ -4,4 +4,6 @@ title: Lessons Learned
weight: 8
---
Not related to any talk directly, but i can recommend this [Blog Post](https://smudge.ai/blog/ratelimit-algorithms) and [Video](https://www.youtube.com/watch?v=8QyygfIloMc&) about rate limiting.
TODO: