Compare commits

...

32 Commits

Author SHA1 Message Date
e2e6463df5 fix(day0):missing quote
All checks were successful
Build latest image / build-container (push) Successful in 58s
2026-03-23 16:56:48 +01:00
cb7854a085 fix(day0):missing quote
Some checks failed
Build latest image / build-container (push) Failing after 48s
2026-03-23 16:54:13 +01:00
ded59d665c docs(day0): Added finops talk
Some checks failed
Build latest image / build-container (push) Failing after 41s
2026-03-23 16:53:28 +01:00
58737cc8ed docs(ll): Added first lessons learned 2026-03-23 16:19:46 +01:00
f401b86d1d fix(day0): Copy paste insert mistake
Some checks failed
Build latest image / build-container (push) Failing after 47s
2026-03-23 16:19:19 +01:00
699a4decb4 docs(day0): Added reverse gitops talk 2026-03-23 16:18:13 +01:00
967d94c021 docs(day0): Added extensible platform talk 2026-03-23 15:44:53 +01:00
3db428448a fix/day0): Typo 2026-03-23 15:44:33 +01:00
2692b12470 feat(day0): Added sched links
Some checks failed
Build latest image / build-container (push) Failing after 39s
2026-03-23 14:22:48 +01:00
8e8886355e docs(day0): glden path talk 2026-03-23 14:20:38 +01:00
58e7967dae feat(day0): Added button for sched link to every talk 2026-03-23 14:11:22 +01:00
a099dd55a8 fix(day0); ":" in yaml because i fucking hat this shit
All checks were successful
Build latest image / build-container (push) Successful in 56s
2026-03-23 13:52:58 +01:00
48e4ba1c46 docs(day0): Added
Some checks failed
Build latest image / build-container (push) Failing after 41s
2026-03-23 13:49:18 +01:00
ee87de55e0 fix(day0): typo in yaml order
Some checks failed
Build latest image / build-container (push) Failing after 41s
2026-03-23 13:14:30 +01:00
a27aea231f docs(day0): Added hanover talk
Some checks failed
Build latest image / build-container (push) Failing after 39s
2026-03-23 13:09:43 +01:00
014a3ecd23 docs(day0): Added my talk 2026-03-23 12:59:30 +01:00
75c3933a7a docs(day0): OTEL feedback talk 2026-03-23 11:05:23 +01:00
d762a87459 docs(day0): Added recommended talk 2026-03-23 10:54:34 +01:00
7aefb59d84 docs(day0): Added sponsor keynotes
All checks were successful
Build latest image / build-container (push) Successful in 57s
2026-03-23 10:09:03 +01:00
12a9eedf30 docs(day0): First real talk of ped 2026-03-23 09:50:39 +01:00
8b2be9625d docs(day0): TCG Group update 2026-03-23 09:19:15 +01:00
3bc3f073d9 feat: Added new button to templates and all slides 2026-03-23 09:16:02 +01:00
cd1cbcf33f docs(day9): Opening remarks 2026-03-23 09:07:59 +01:00
8685fc71b9 docs(day-2): Added the final talk of the day (kcp+crossplane)
All checks were successful
Build latest image / build-container (push) Successful in 55s
2026-03-21 17:28:16 +01:00
25aa419cc5 docs(day-2): Added multitenancy talk 2026-03-21 16:50:20 +01:00
9f9371bd71 docs(day-2): Added multitenancy talk 2026-03-21 15:57:53 +01:00
974f9f941d fix(day-2): Typo 2026-03-21 15:46:51 +01:00
cdf7163a27 docs(day-2): Chaos Engineering talk notes 2026-03-21 15:30:57 +01:00
8ce0ccda0d docs(day-2): Added cilium talk notes 2026-03-21 14:57:39 +01:00
078e397fa7 docs(day-2): Added cilium talk notes 2026-03-21 14:54:02 +01:00
06be05a410 feat(template): Add code placeholder 2026-03-21 14:37:01 +01:00
6e9c7c728b docs(day-2): Added code link 2026-03-21 14:25:21 +01:00
32 changed files with 951 additions and 19 deletions

View File

@@ -7,5 +7,8 @@ tags:
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://colocatedeventseu2026.sched.com/event/2DY6g" style="error" icon="calendar" %}}Sched Link{{% /button %}} -->
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
TODO:

View File

@@ -9,6 +9,9 @@ This "blog" certainly contains a bunch of tyops.
This is what typing the notes blindly in real time get's you.
Every year I tell myself that I will fix them afterwards: To be fair I fix most of them but not all and that's fine.
Also the notes tend to start out strong early in the week (aka Rejekts + CloudNativeCon) and fall off in terms of density and depth.
This also happens on a daily basis as soon as my brain starts to overflow with information (but not as noticable until late in the day)
## How did I get there?
I attended Cloud Native Rejekts and KubeCon + CloudNativeCon Europe 2026 in Amsterdam.
@@ -17,7 +20,8 @@ Why? Because learning about all new things in the world of cloud is really impor
I enjoyed [last year's experience](https://kubecon25.nicolai-ort.com) and [the year before](https://kubecon25.nicolai-ort.com), so I wanted to go again.
And I managed to get a free ticket by being accepted as a speaker at the Platform Engineering Day Europe 2026 🥳.
(Alto I already convinced my business parter that the company would pay for my ticket before I got the news)
(Although I already convinced my business parter that the company would pay for my ticket before I got the news)
So I was there proudly representing ODIT.Services.
## And how does this website get it's content

View File

@@ -4,6 +4,11 @@ title: Day -1
weight: 3
---
This year there was only one day of Cloud Nativ Rejekts. So this was a down day. Well if your define finishing two talks downtime. But certainly no conference today.
This year there was only one day of Cloud Nativ Rejekts. So this was a down day.
Well if your define finishing two talks as downtime. But certainly no conference today.
Last year Rejekts happend on sunday and monday with the Co-Located events on tuesday and KubeCon from wednesday to friday.
It was very cool having two full days of Rejekts last year but the day of preparation is certainly appreciated.
Also this is the day that most my friends (that are attending KubeCon) arrived.
No one from back home attendes Rejekts but as mentioned in yesterday's notes I met some awesome people I get to see every year at these events alonside some new - but nevertheless cool - humans-

View File

@@ -8,6 +8,7 @@ tags:
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
The basic welcome statements logistical stuff.
Also a bit of history on how we ended up in MiroÄs offices (they kinda saved this year's Rejekts Europe because they were missing sponsors and a location).

View File

@@ -9,6 +9,7 @@ tags:
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
A talk by EDERA - one of the sponsors of Cloud Natice Rejekts.
@@ -28,7 +29,7 @@ A talk by EDERA - one of the sponsors of Cloud Natice Rejekts.
## Kubernetes joins the game
- Background: Kubernetes is built for containers and not for deep isolation
- Existing solutions: KubeVirt (manage KVM through KubeAPI)m kada Containers (Deeper Sandbox), GVisor (emulated syscalls)
- Existing solutions: KubeVirt (manage KVM through KubeAPI)m kata Containers (Deeper Sandbox), GVisor (emulated syscalls)
- EDERA's idea: Their own CRI (container runtime interface) that makes vm management transparent and can run vms alongside containers
- Potential Problems:
- Kubernetes assumes that cgropups exist

View File

@@ -10,6 +10,7 @@ tags:
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
A talk by Thilo - flatcar maintainer and cool guy.
The talk consisted of multipe demos and a warning of this being the alpha version of the talk but most things worked out fine.

View File

@@ -9,6 +9,7 @@ tags:
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
From the guys at EDB (the comercial offerin connected to CNPG).
If you haven't hear of CNPG: Use it, please

View File

@@ -8,7 +8,8 @@ tags:
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
TODO: Copy repo link for samples
{{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}}
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
The statistics of these talks are based on a survey including multiple companies, focused on ones that build and run applications

View File

@@ -0,0 +1,40 @@
---
title: "Unleashing the tides of kubernetes networking by removing kube-proxy"
weight: 6
tags:
- rejekts
- isovalent
- cilium
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
{{% button href="https://github.com/ttarczynski/cilium-kpr-demo" style="info" icon="code" %}}Code/Demo{{% /button %}}
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
A talk by isovalent (now part of cisco - god i love that they have to say this every time).
It'S a good baseline introduction to how kubernetes service routing works but also a bit dry (in terms of the presentation itself).
I skipped the introduction to cilium in these notes. The docs exist for a reason.
## Kubernetes Services - a baseline
- East-West: ClusterIP -> App2App inside the cluster
- North-South: NodePort -> External Client to app in Cluster
## Kube-Proxy - IPTables Mode
- IPTables: Traffics flows through different tables/chains - most imporantly the NAT-Table
- Every Node has it's own kube-proxy next to the kubelet
- ClusterIP: Scales to a huge numer of rules when exposing multiple services
- NodePort: Masquerades sources if routing cross-node (Source-IP is lost)
TODO: Steal iptables visualizer
TODO: Steal livecycle of a packet clusterip
TODO: Steal livecycle of a packet nodeport
## Kube-Proxy free
- Cilium deploys one agent pod per node that handles management of eBPF on the kernel
- ClusterIP: LoadBalancing happens on the socket-level
- NodePort: Also does SNAT
- Adds hubble for observability

View File

@@ -0,0 +1,51 @@
---
title: "How Chaos-Engineering works: Implementing Failure Injection on Kubernetes with Rust"
weight: 7
tags:
- rejekts
- chaos
- rust
- operator
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
{{% button href="https://github.com/ioboi/kerris" style="info" icon="code" %}}Code/Demo{{% /button %}}
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
A general introduction to chaos engineering with specificly showing implementations by Chaos Mesh and Litmus.
After that the talk continues into the implementation of a custom chaos operator, written in rust.
## Chaos Engineering
> Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the systems capability to withstand turbulent conditions in production
> ~ Chaos Community (2015)
- CNCF-Projects: **Chaos Mesh**, **Litmus**, Chaos Blade, Krkn
- Types: **Pod Chaos** (Delete/Terminate), **Pod Network Chaos** (Faults/Packet loss), Node Chaos, JVM Chaos, Infra Chaos(Reboot vms), ...
## APIs
- Chaos Mesh: A Specific CRD per Chaos Type (PodChaos, NetworkChaos, ...)
- Litmus: Chaos Engine Config that defines the type in it's spec
TODO: Steal sample CRDs
## DIY
- Baseline: Written as a controller in Rust (out of curiosity) with support for Pod Chaos and Network Chaos
- Entrypoint is a reconcile function that returns an action (requeue, etc) and an error
- Network Chaos uses traffic controll (`tc` part of `iproute2`) to do the limiting and loss
- If you're interesten in the rust implementation: Look at the code linked above
```mermaid
graph LRT
controller
subgraph node
daemon
containerd
daemon-->|grpc|containerd
end
controller-->daemon
```

View File

@@ -0,0 +1,34 @@
---
title: "Push the boundaries of kubernetes multi-tenancy with containerruntimeclasses"
weight: 8
tags:
- rejekts
- runtime
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
I missed the first 3 minutes of this talk because they started ealy so the notes are currently missing the first levels of multi-tenancy
This was a real interesting introduction into the world of runtime classes and how you could use them to choose the right level of isolation for each of your pods/deployments utilizing different runtimes/shims. Running everything from normal containers to hardened/Emulated processes and vms side-by-side in kubernetes.
## Levels of multi-tenancy
- God-Level: A physical clusters seperated out in multiple virtual clusters which can be isolated into even more nested virtual clusters (for )
- Problem: We're using the same container runtime
## Runtimes
- There are different runtimes since TODO -> They replaced dockershim as the runtime in 1.24
- Choice can range from cri-o )performant) to kata containers (secure)
- In the past there was no plugin architecture (node had to be reinstalled and restarted to switch cri) now you just have to update the container confug through a new RuntimeClass
- Can be targeted for each Pod/Deployment Spec
- You can still use containerd as the default class with shims (Shim v2 Project) for specialized runtimes like kata or windows
- Expansion: KubeVirt - vms as a runtime class (also implemented by others like kata with qemu isolation)
## Pro/Con
- Pro: Security, Cost optimization, Performance optimization, diversity/flexibility
- Con: Day2 complexity, complex debugging (anyone say networking), additional costs of using VMs

View File

@@ -0,0 +1,64 @@
---
title: "Yor Cluster Isn't flat: A First-Class API for Real-World Infrastructure Topology"
weight: 9
tags:
- rejekts
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://github.com/JesseStutler" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
By a volcano maintainer from Huawei - a very wholesome guy.
I don't know why the organizers always tend to schedule these very technical topic by people with a bit of an harder accent (totally understandable but very quiet) near the end of the conference or day? I thank the Sakura Edition Red Bull for keeping my attention span up and running for the last two sessions of the day.
## History of vokcano
- 2017: Kube-Batch open soruce
- 2019: Volcano Open Source
- 2020: CNCF Sandbox
- 2022: CNCF Incubation
- 2026: Road to graduation
## Volcano feature overview
- Unified Scheduler
- Queue Management
- Workload Colocation
- Multi cluster scheduling
- Heterogenus Device Support
- Multiple Scheduling policies
## Why topology awareness?
- Scenario 1: Bottlenecks in LLM-Training when jobns are not placed on GPUs that are close
- Scenario 2: Inference runs as Seperate Prefill and Decode Jobs on different hardware -> Short network hops needed
- Node labels can be used but are very limited
- Datacenter network architectures are heterogenus -> Everyone can buil in their own style
## Scheduler notation mechansis
- Label: Kueue, Koordinator, KAI Scheduler
- Vendor-Specific Syntax
- No hierarchy
- Need to be manually set
- No healthchecks
- Cloud Specific
- CRD (Long term): Volcano
- Standardized API (HyperNBode)
- Hierarchical (Trees/Zones)
- Auto-discovery - Plugin-Ready (e.G. NVIDIA)
- Healhchecks
- Unified across clouds and on-prem
## Architecture CRD Sample
TODO: Steal Leaf sample from slides
## What's next
- GPU 3D Architectures (Internal interconnects, NUMA, external interconnects)
- DRA integration/collabaration
- Promotion of HyperNode to a first-class citizen -> Extraction from Volcano to be truly generic

View File

@@ -0,0 +1,80 @@
---
title: "Achiving Platform Engineering Multi-Tenancy with kcp and Crossplane"
weight: 2
tags:
- rejekts
- kcp
- crossplane
- kubermatic
- upbound
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
{{% button href="https://github.com/SimonTheLeg/crossplane-and-kcp-demo" style="info" icon="code" %}}Code/Demo{{% /button %}}
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
An introductory talk to kcp and crossplanes by the companies maintaining both of them.
## The basics
- A platform should me automated and self-service driven to count as platform engineering
- Provider teams: Certificates, databases, ...
- Consumer teams: Want to use a provided Service
- IDP: Sits in the middle -> The real hard part
## KCP
- Idea: Why not use Kubernetes as our API-Layer? It tracks API ownership, versioning and resource managment and has built-in extensibility (CRD)
- Problems:
- APIs are always cluster-scoped (you advertise them to everyone) -> You could give everyone a cluster
- Ramping up a new cluster takes time and resources -> Let's just create a lightweight hosted control plane with it's own datastore
- Sharing APIs to multiple clusters is hard -> Leightweight control planes with a shared datastore
- Solution: Workspaces that are organized in a tree and each workspace contains it's own CRDs and RBAC -> All resources (e.g. namespaces) exist just in their own workspace
- API-Sharing; APIExport CRD and APIBinding CRD (reference via the workspace path of the APIExport)
- Running the operators that work on the APIs: Virtual Workspaces (virtually connects your operator to all of their resources across kcp via a magic kubeconfig) -> Controller needs to be built with multicluster-runtime (drop in replacement for controler runtime)
- KCP API-Syncagent allows you to use a existing operator without modifying it for use with multicluster-runtime
```mermaid
graph
KCP
Datastore
User
subgraph Workspace
APIs[API/CRD]
RBAC
end
KCP-->|interact with|Datastore
User-->|Create tenant|KCP
KCP-->|Manages|Workspace
KCP-->|Return kubeconfig|User
User-->|Uses KCP like the apiserver|KCP
```
## Crossplane
- Providers for all kunds of resources (kubernetes or infra/cloud)
- Compositions for higher level abstractions accross one or multiple providers
- Uses the Kubernetes API (aka CRDs) as it's api to enable integration with standardized tooling (like GitOps)
```yaml
apiVersion: ...
kind: CompositeResourceDefinition
spec:
compositetyperef:
group: my.exdample/v1aplha1
kind: Test
mode: pipeline
pipeline:
- ...
```
## The demo
I recommend watching the recording but thul shall serve as a overview of the scenario.
Or run it locally (code linked above).
- User whants to order a new database in their workspace a
- Database team offers their API through their database workspace
- Database team runs their operator in their own cluster
- kcp api-syncagent swyncs the database crd from workspace a into the db team's cluster and the connection-secrets back to the workspace

View File

@@ -17,8 +17,17 @@ I have to admit that I'm very bad with names and don't always regocnize people b
## Talk recommendations
- If you're building operators: [Solving Operator Extensibility: A gRPC Plugin Framework for kubernetes](./04_operator-estensibility)
- [Intro to both chaos engineering and building operators that interact with containerd in rust](./07-chaosengineering)
- The idea behind [The self-improving platform: Closing the Loop Between Telemetry and Tuning](./05_selvimproving) is very interesting but the first half of the talk is kinda confusing as it discusses a study that could have been shortened drasticly. But the way they automaticly create PRs for resource utilizations is cool
- [A good introduction to kcp and crossplane](./10_kcpcrossplane)
## Other stuff I learned or people i talk to
- TODO:
- Arik about dprecation of CNCF projects
- Simon and Koray about demo prep for talks
- Arik and Simon about the review process for conference talks
- Nico
- Stephan
- A nice guy who's name i forgot (did i mention that I'm bad with names yet?) about the process of bleaching/dyeing my hair (he asked for a friend)
- A group of random people in the elevator about Neon Genisis Evangelion (not a tech-topic but hey)
- And a bunch of smalltalk and deeptalk with the awesome attendees

View File

@@ -0,0 +1,18 @@
---
title: Welcome + Opening remarks
weight: 1
tags:
- platformengineeringday
- keynote
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://colocatedeventseu2026.sched.com/event/2DY6g" style="error" icon="calendar" %}}Sched Link{{% /button %}} -->
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
The usual welcome and thank you to our sponsors.
And of course the anounceement of the lunch break and evening reception.
This is the fifth edition of platform engineering day that originally started 2023 in Amsterdam.

View File

@@ -0,0 +1,40 @@
---
title: "CNCF Platform Engineering Technical Community Group Update"
weight: 2
tags:
- platformengineeringday
- keynote
- update
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://colocatedeventseu2026.sched.com/event/2DY6g" style="error" icon="calendar" %}}Sched Link{{% /button %}} -->
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
{{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}}
## Basics
The TCG (no not trading card game) is a community focused on:
- Sharing knowledge and best practives
- Exploring Tools
They have two meetings (one in EMEA one in APAC) rotating every week with two main formats:
- Hallway Track: Interactive discussion
- Community Track: Sessions
## Current Initiatives and Updates
- **Whitepaper:** Internationalisation + new simplfied language and version selection for change tracking
- Whitepaper is heading towards v2
- **Maturity Model:** Simnplified language + examples and a self-assesment tool/framework
- Plans for v2: Usage&Helpfullness survey + Input from different viewpoints
## Roundtables / Interaction
- Community coffee every day at 7am
- Platform Lunchtime aka select tables for discussion Tue-Thu
- Both in the project pavilion on wednesday
- Lightning talks at the booth from 13:00 to 13:40
Closing sentence: "May your paths be golden"

View File

@@ -0,0 +1,55 @@
---
title: "Who built this platform? Alternative viewpoints on Platform Design"
weight: 3
tags:
- platformengineeringday
- platform
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
{{% button href="https://colocatedeventseu2026.sched.com/event/2DY1S" style="error" icon="calendar" %}}Sched Link{{% /button %}}
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
The talk was kept interactive through live surveys.
## Baseline
- Research on underrepresented groups in platform engineering focus mostly on gender and seneority
- Problem: There is little data about the devide beween standard and non-standard users of platfoms
## Problems and changes
- In the past often the code was the documentation leading to the need of revse engineering
- Problem: This might work for some people (confident and open to do a deep dive into the code) but can leat to major onboarding problems -> Rise of documentation (platform stays the same but the interface for onboarding changes)
- Breaking point: Production incidents by human designs based on the wron assumption of humans always being very careful -> The process is the cause so we have to change the process and add guardrails
- Abstraction leakage: Our Pod is failing. Why? Resource error, but the real issue was rooted in the infrastructure (Problem with spot instances but only the platform team knows this, not the app team) -> The platform assumed knowlege about infrastructure
- Decisions are usually made by senior engineers but the alternative viewpoints (new hires, juniors, users without deep knowledge) who struggle are the ones that actually reveal blind spots
## Who is the platform built for
- The platform can not cale if it's only built for the engineers who built it
- We have to invite engineers from different viewpoints to improve our platform
- Abstractions are ok bu they can't hide important information
## Humans as a Platform
- Perspective shapes design -> Optimize for usability
- Broader participation reveals blind spots -> Improve dx wihout lowering standards
- Developer experience is an architectural concern -> Observability and usability must be
## Measuring success
- Time to first deployment
- Self-service success rate
- Help requests per workflow
- Workflow failure recovery time
## Takeaways
- Platform usabiility is a core engineering responsibility
- Designs needs to include different view points and cognitive needs
- Self service and clear documentation reduce friction
- Fast feedback loops enable continuus improvement
- Balance accessability with developer speed -> Inclusive desig should not hinder improvements but go hand in hand

35
content/day0/04_vmware.md Normal file
View File

@@ -0,0 +1,35 @@
---
title: The Node OS Is Part of Your Platform Contract
weight: 4
tags:
- platformengineeringday
- keynote
- sponsor
- vmware
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://colocatedeventseu2026.sched.com/event/2DY6g" style="error" icon="calendar" %}}Sched Link{{% /button %}} -->
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
The usual "teaser" ment to get people to visit their stand and other talks
## What happens if
- A kernel update breaks the CNI and affect every pod by breaking
- An app mutates the system MTU breaking all other networking operations and even cluster management
## Baseline
- Platform engineers build abstractions every day
- Infra owns the hardware and hypervisor
- Platform own kubnernetes, gitops and so on
- Problem: Who owns the Node OS
## Node states
- ClusterAPI assumes immutable nodes by relacing them when updating the os, kubernetes or cri
- But we want mutability for: Simple config updates, Zertificate things
- So why immutable: Version alignment, drift detection

View File

@@ -0,0 +1,28 @@
---
title: Bridging the Local Kubernetes Gap for AI Developers
weight: 5
tags:
- platformengineeringday
- keynote
- sponsor
- loft
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://colocatedeventseu2026.sched.com/event/2DY6g" style="error" icon="calendar" %}}Sched Link{{% /button %}} -->
{{% button href="https://github.com/loft-sh/vind" style="info" icon="code" %}}Code/Demo{{% /button %}}
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
Basicly a short advert for their vind project/tool.
## BAseline
- Current local setup: KIND, K3D or simmilar tools
- Problem: Getting things like GPUs to work, manual image loading, no built-in loadbalancers
- Result: There is still a gap between local and prod
## Their solution: vCluster in Docker (VIND)
- Features: Built-in LB, Sleep/wake, Pull-through Cache, Multi-Node Clusters, External Nodes (via VPN)
- USP: Platform UI

View File

@@ -0,0 +1,36 @@
---
title: Rebuilding our platforms in the age of ai
weight: 6
tags:
- platformengineeringday
- keynote
- sponsor
- vmware
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://colocatedeventseu2026.sched.com/event/2DY6g" style="error" icon="calendar" %}}Sched Link{{% /button %}} -->
<!-- {{% button href="https://github.com/loft-sh/vind" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
A call to joint them at their booth and reading their whitepaper.
## BAseline
- 2024: LLMs
- 2025: Agents
- 2026: Our platform/infra can not keep up -> The number of def
## Mandate
- Drive AI-readyness initiatives
- Data-driven ai rollout
- Measure for impact instead of activity or speed
- Paved paths and guardrails
- AI infra optimizations
## Observations
- Whats good for humans is also good for llms (clear documentation, strucutred data, readable code, ...)
- Metrics from out tools only measure that something is happening -> We need a framwork for DX and AI

View File

@@ -0,0 +1,52 @@
---
title: "Scaling on satisfaction: Automated Rollouts Driven By User Feedback"
weight: 7
tags:
- platformengineeringday
- staging
- rollout
- feedback
- otel
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://colocatedeventseu2026.sched.com/event/2DY2c" style="error" icon="calendar" %}}Sched Link{{% /button %}}
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
<!-- {{% button href="https://github.com/thomasvitale/kubecon-2026-gitops" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
{{% button href="https://whitneylee.com" style="info" icon="link" %}}Website/Homepage{{% /button %}}
<!-- {{% button href="https://thomasvitale.com" style="info" icon="link" %}}Website/Homepage{{% /button %}}
## What they are actually talking about
- A way of creating metrics/traces from an llm and anlyzing them
- The integration of the user's feedback
- Basicly the integration of what the variant did on the server to the vote event to promote based on feedback
- Combined with an into to OTEL
## Baseline
- Question: How do we know that content generated by llms and delivered to our users is good or bad
- Idea: Using OTEL and user feedback to drive canary deployments and rollout
- Needed: A standardized vocabulary (so we can talk to any telemetry system)
## Demo Architecture
The start of the talk featured an evolving story (5 parts) and let the attendees vote on if they like it or not to emulate rollouts of a new application version with immideate user feedback. It was based on flagger deciding every thrity seconds if the user feedback allows promotion of new versions.
- Audience get's plit to two variants running as knative deployments
- OTEL Collector collects telementry data and the platform (Flagger) uses it as the basis for it's decisions
- The collection was done by creating a user session span with a span event (aka a log) regarding the voting -> Span events are deprecated and will be moved to a logs api
## Now to our platform
- Stack: Kubernetes on hetzner with components (cert-manager, ingress, knative, ...) packaged by carvel
- knative as the deployment target for apps
- Flagger as the release decision tools
- OTEL for instrumentation
- Crossplane
- API: StoryApp CRD as the main interface for controlling what we want to deploy
## Takeaway
- Include user feedback in the decision process for new rollouts
- OTel can be used to automate thes

View File

@@ -0,0 +1,20 @@
---
title: "Promotion Failed: How (Not) To Stage Your (Kubernetes) Platform"
weight: 8
tags:
- platformengineeringday
- staging
- mine
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://github.com/nicolaiort/pedeu2026-promotion-failed/blob/main/Slides.pptx" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
{{% button href="https://colocatedeventseu2026.sched.com/event/2DY8r" style="error" icon="calendar" %}}Sched Link{{% /button %}}
{{% button href="https://github.com/nicolaiort/pedeu2026-promotion-failed" style="info" icon="code" %}}Code/Demo{{% /button %}}
{{% button href="https://odit.services.com" style="info" icon="link" %}}Website/Homepage{{% /button %}}
{{% button href="https://nicolai-ort.com" style="info" icon="link" %}}Website/Homepage{{% /button %}}
By me: That's my talk - yay!
![](../_img/yippie.png)

View File

@@ -0,0 +1,59 @@
---
title: "The Day 2 Hangover: What To Do After the Platform Party Ends"
weight: 9
tags:
- platformengineeringday
- platform
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
{{% button href="https://colocatedeventseu2026.sched.com/event/2DY55" style="error" icon="calendar" %}}Sched Link{{% /button %}}
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
{{% button href="https://www.syntasso.io/post/the-internal-platform-scorecard-speed-safety-efficiency-and-scalability" style="info" icon="link" %}}Platform Scorecard{{% /button %}}
{{% button href="https://www.syntasso.io/post/the-internal-platform-scorecard-speed-safety-efficiency-and-scalability" style="info" icon="link" %}}Recruting producers{{% /button %}}
## Time to build a platform
- Why? Boss want's that
- How? Who knows
- First: Gather rquirements
- Then: No one actually uses our platform -> The only users were the low value ones and other teams
## Why does no one like out platform?
- Requirements are constantly changing so if you take 6 months they need something else/additional -> The platform no longer serves it's users
- Time to delivery is real important -> If you are no longer the bottleneck people actually like you
- Over time your clean catalog slowly grows to become a sprawl of garbage with specialized variations for everything
## Why did it fail
- Platforms are treated as a project with achivable end goals
- Building in a slio (accidentally) -> Requirements change and require lots of work and optimization
- You might think that you are your own customer but you are not -> There are many different perspectives involved (DEV, SRE, OPS,. Management, ...)
## How can we prevent this fallacies?
- Hire a exdperiences platform product manager (if you can)
- Output over outcome -> Maturity model helps
- Focus on user values like speed, safety, efficiency, scalability
- Find your exempar team instead of trying to build for everone from zero -> Choose an experiment friendly team with high visibility and a good feedback culture
- Adopt tools and practives to make things easier
- GitOps over imperative pipelines
- Build shared platform services (undifferentated infra like registries, github runners and stuff like that)
- Ship observability as early as possible (opptinionated dashbaords and alerts from day 1)
- Make everything self-service
1. Standardize as platform capabilities
2. Bundle each capability as an api and associated workflow
3. Expose the APIs thrugh a pluggable IDP
- Recruit producers (providers) they are experts and allow you to scale your capabilities
- Be the layer that facilitates exchange between producers and consumers
TODO: Steal persona matrix from slides
## TL;DR
- Outcome over output
- Find values for all users (with some first citicens as guidelines)
- Everything as a Selfservice
- Recruit producers

View File

@@ -0,0 +1,89 @@
---
title: "A Practical Guide To Inner Sourcing Your IdP"
weight: 10
tags:
- platformengineeringday
- platform
- idp
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://www.canva.com/design/DAHBlkrzYH8/ZSe4SkJVGy8GnMtDcCGdHw/view?utm_content=DAHBlkrzYH8&utm_campaign=designshare&utm_medium=link2&utm_source=uniquelinks&utlId=h86fe2f7313" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
{{% button href="https://colocatedeventseu2026.sched.com/event/2DY5r" style="error" icon="calendar" %}}Sched Link{{% /button %}}
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
{{% button href="https://lianmakesthings.dev" style="info" icon="link" %}}Homepage{{% /button %}}
{{% button href="https://devrelbook.com" style="info" icon="link" %}}Dev Rel{{% /button %}}
A talk by Lian and i can always recommend her talks.
## Background
- 55% of orgs have adopted but only 27% have integrated best practives -> They built technically impressive ghost towns
- We're not good at explaining/thinking about platforms -> Viewing them as a whole house instead of the collection of bricks that make them up
- The deployment process evolved from scripts to services (shared scripts) to platforms (shared scripts with extra steps)
- Platform adoption is not a technical but an advocacy problem
- Idea: Treat the platform to as a shared project -> We need devrel (Dicover evaluate learn build scale)
TODO: Steal lian's evolution slide
## Case studies
### A
- Base: Buerocratic and individual driven with general apahty due to frequent objective changes but with individual heroes
- Ideas:
- Internal user groups (low contribution ight amount of complaints)
- CI Librarbies (that no one used)
- A pilot program -> Actually works
### B
- Large manufacturer, mission driven but tool anarchy and a buch of sub-cultures
- Collection:
- Diagnosys:
## The platform advocacy framework
- Goal: Turn passive users into passionate advocates
- Based on: CNCF platform maturity model
- Levels: Provisional, Operational (Dedicated Team+Budget), Scalable (Grow without diusruptions), Optimizing (Anticipate needs/changes)
- Aspects: Investment, Adoption, Interfaces, Operations, Measurement -> Aspect levels are indipendent from one another
- Potential Trap: Creating actions just to graduate to the next level without actually keeping cultiure in mind
- The orbit model:
- Love: Activity in the community
- Reach: Ability to pull others with you
- Risk: Only focussing on high reach+love but the peopple with the most valuable feedback are usually hrll or lrll
- Tribal Culture
- Stages: Life Sucks -> My Life Sucks (Victim of the system mindset) -> I'm great - no you are not (lone warrior) -> We are great (mission driven) -> Life is great (team driven)
- You have to move from one stage to another, no skipping
### CNCF Platform maturity
TODO: Copy from slide
### Orbit model
TODO: Make readable
| | Low Love | High Love |
| - | - | - |
Low Reach/Love: Detached observer Contractors, Juniors
Low Reach/high love: Indipendent Fans Early adopters, individual contributor
HR /ll: Sceptics (Managers, Architects, CTOs)
HR/HL: Exoert multipliers (Senior ICs, Internal Advocates, Platform champoins)
### Tribal Culture model
TODO: Beautify the text above
## Using the framework
- Culture first -> The culture sets your options and limits
- Cycle:
1. Collect data (Use the three discussed maturity models)
2. Analyse bottlenecks/root causes (Keep with the DevRel journey)
3. Invervene: Decide on actions and take them (sientific method style: hypothesis -> define action -> define success metric and threshhold -> take action -> compare desired outcome)
4. Repeat
- Important: Requires patience if it's hard to show the value or missing value in numbers
- Innersourcing moves the platform from the team into the whole org

View File

@@ -0,0 +1,63 @@
---
title: "Build Your Golden Path Construction Playbook: A Maturity-First Implementation Approach"
weight: <index of talk on the day>
tags:
- platformengineeringday
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
{{% button href="https://hosted-files.sched.co/colocatedeventseu2026/7c/golden-path-implementation-atul-kubeconeu-26compressed.pdf?_gl=1*xrtp7z*_gcl_au*NDQ0MzAwNDQuMTc3NDAyNzk5NQ..*FPAU*NDQ0MzAwNDQuMTc3NDAyNzk5NQ.." style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
{{% button href="https://colocatedeventseu2026.sched.com/event/2DY6g" style="error" icon="calendar" %}}Sched Link{{% /button %}}
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
A lot of people have talked about golden paths for platforms since KubeCon EU 2022 (including the speaker of this session).
Most of these talks only tell you why you need this and about the value-add but not about the journey.
## Baseline
- Golden Path: A supported opinioated way to build and deploy software that makes the right wa the easiest way while not beina a mandate but a product and roducing decision fatique
- Intro analogy: The bikepaths are the golden path of amsterdam: Strongly oppinionated and self-service, safe by default and built progressivly -> Reduces cognitive load
- The golden path always related to the journey across the platform maturity model (our lord and savior of the day)
## How do we start (and continue)
### Phase 0: Chaos
- Every team has their own deployment script (or just `kubectl apply -f`)
- Same goal, 0 consistency, no standards
- Deploy latest, no limits, ....
### Phase 1: Standardize
> Unify existing scripts
- One script, one template, every team -> Teams just insert thgeir image, name, ...
- Resource Limits, healthchecks, labels -> Everything enforced by default
- Same command -> Consistent output every time
### Phase 2: Validate + make it scalable
> Improve the previously unified script
- Config driven -> Declarative YAML instead of cli arguments
- Reject bad inputs before they touch the cluster
- Environment-aware defaults for dev, staging and prod
### Phase 3: GitOps
> The script still handels creation/validation we only move the apply
- `git push` instead of `kubectl apply`
- Tool watches the repo and deploys automaticly
- Every deployment is a commit -> Audit trail and rollbacks
### Phase 4: IdP
- Same validation and push logic
- Replace the script + template with a form -> No more YAML or CLI needed
- Any developer can use it and deploy
## Journey overview
TODO: Steal from slides

View File

@@ -0,0 +1,46 @@
---
title: Building Extensible Platforms
weight: 12
tags:
- platformengineeringday
- platform
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
{{% button href="https://colocatedeventseu2026.sched.com/event/2DY7J" style="error" icon="calendar" %}}Sched Link{{% /button %}}
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
{{% button href="https://hazelweakly.me/" style="info" icon="link" %}}Website/Homepage{{% /button %}}
## Baseline
Complexityis unavoidable: May it be in platforms, physics or biology
Complexity usually grows rappidly followed by a phase of making things better -> "Let's make virtual machines easier but add 500x more of them" (the story of kubernetes)
TODO: Steal chart from slides
## Mthods to the madness
- We want confluence (the mathematical one, not the atlassian one): If there are different paths that are semanticly the same it should not matter wich path we take
- As long as we achive the same goal the specifics should not matter that much
- Narrow Protocol: A narrow and slowly evolving protocol allows us to have diversification below and above the narrow point
- Examples: IP Adresses and DNS haven't changed in a long time -> We can use this narrow thing to build monsters on top of
- Allow everything above and below your platform to change flexibly and evolve
- Vertical integration: When you're getting integrated you become valuable (again the DNS example) -> Vertical integration always beats technological capabilities
- Picking the right protocol might be hard, but choosing a narrow and integrated one helps
TODO: Steal confluence slide
## Extensibility
- The expression Problem: The static typesafety vs type-flexibility question
- So we want a platform where we can
- In a typesafe manner
- onboard new applications into the platform
- onboard new integrations into the platform
- Without requiring a end-user or platform migration/deploy
- Without centralising definition of interfaces or applications
- Without assuming a certain platform or applicationg architecture
- So basicly: Protocolas are the new platforms
- You have to be fine with people using your protocol for whatever they want -> It needs to be abusable

View File

@@ -0,0 +1,38 @@
---
title: "The GitOps Paradox: Why Your Devs Need an API You Don't Want To Build"
weight: 13
tags:
- platformengineeringday
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
{{% button href="https://colocatedeventseu2026.sched.com/event/2DY82" style="error" icon="calendar" %}}Sched Link{{% /button %}}
{{% button href="https://github.com/ConfigButler/gitops-reverser" style="info" icon="code" %}}Code{{% /button %}}
{{% button href="https://revertgitops.dev" style="info" icon="link" %}}Website/Homepage{{% /button %}}
## Baseline
- We all like files, humans like them, they are easy to use and AI can use them
- Git is a good version manager
- GitOps is cool and we want to keep it
- **Problem**: I want to do something and now i need to create a yaml file and open a pr -> High toil
- **Question**: Should git always be the primary entry path for changes
- **Idea**: Use kubernetes api as our api server, have an operator that handels creation of manifests in git + promotions and so on and the changes to the "real" clusters are always made via gitopös
## The paradox
- We shield people from the kubernetes api and force them to use git
- We build our own custom apis on top of the kubeapi anyways
- Solution: Why not go back to using the Kubernetes API (humans and robots can use it)
## Principles of the reverse gitops manifest
1. API first (via Kubeapi) to reuse auth, crds, status, controllers and more
2. Capture intent not implementation -> Not api-only (if you can do it bi-directional) with reversible&declarative resources
3. GitOps applies -> Changes are always piped through gitops for reviews, history, rollback, promotions, ...
## Implemantations examples
- Use KRO as the abstraction that captures the intention
- [GitOps Reverser](https://github.com/ConfigButler/gitops-reverser)

61
content/day0/14_finops.md Normal file
View File

@@ -0,0 +1,61 @@
---
title: "When Platform Engineers Lead FinOps: Driving Reliability and $20M in Savings"
weight: 14
tags:
- platformengineeringday
- finops
- legacy
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
{{% button href="https://colocatedeventseu2026.sched.com/event/2DY3O" style="error" icon="calendar" %}}Sched Link{{% /button %}}
<!-- {{% button href="https://github.com/graz-dev/automatic-reosurce-optimization-loop" style="info" icon="code" %}}Code/Demo{{% /button %}} -->
<!-- {{% button href="https://cloudnativeplatforms.com" style="info" icon="link" %}}Website/Homepage{{% /button %}} -->
A case study from expedia about finops.
## The Cost-Reliability disconnect
- Background: Modern Infrastructure is complex ans large (1000s of clusters, multi-region,...) with huge operational responsibilities (SLA, SLO, scalabiloity, ...)
- Platform Team: REliability, Performance, Stability
- FinOps Team: Cloud Resources reduction, budget adherence, efficiency
- Problem: Conflicting goals and often organizationally seperated
- Blind cost optimzation can lead to unintentional stability/performance problems that can quickly spiral
- Blind stability optimizations quickly lead to large overhead/overprovisioning and huge costs
## Patterns
- Establish views & Baselines: Unserstand cost per cluster/workload and utilization patterns
- Revisit legacy: Old configs like static sizing, huge buffers, ...
- Embrace rearchitecture without fear: Consolidation, instance optimization, infra rededisn should all be on the table
### Views & baselines
> General recommendations
- **Problem:** Lack of cost attribution for shared info
- **Problem:** Lack of insights into which clusters are generating consts
- **Problem:** No transparency into which teams are consuming resources
- **Solution:** Bring the generation of cost together with the existance of costs
- **Solution**: Identify a safe operating range that wraps the "optimal zone" with a buffer for over- and underutilization -> Baseline for automatic scaling
### Revisiting legacy
> General recommendations
- **Problem childs**: Idle clusters (just in case i need one fast), oversized compute (safety buffers overdone) and unterutilized clusters
- **Challenge**: No one wants to touch a running system
1. Analyze historical utilization (identifiy spikes/traffic patterns)
2. Identify safe optimization opportunities
3. Roll out changes gradually
### Rearchitecture without fear
> What they did in their legacy systems
- Find out if your current workload actually need the currently selected note types
- Optimize Jobs into batches
- Even if the size is right: Check if you can switch to newer nodes with better price to performance
- Kustomize autoscaling with tools like KEDA to scale on actual load instead of diffuse side-effects

Binary file not shown.

After

Width:  |  Height:  |  Size: 204 KiB

View File

@@ -11,7 +11,10 @@ This day also included my highlight of the conference (I'm writing this on the s
## Talk recommendations
-
- [Who built this platform? Alternative viewpoints on Platform Design](./03_whobuiltthis)
- [A Practical Guide To Inner Sourcing Your IDP](./10_sourcing-your-idp)
- [Thought provoking talk by Hazel](./12_extensible)
- Very interesting concept that trues to combine a user-facing api with gitops in the background [Reverse gitops](./13_gitops-paradox)
## Other stuff I learned or people i talk to

View File

@@ -8,14 +8,4 @@ TODO:
## Other stuff I learned or people i talk to
- Isovalent
- Kubermatic
- Portworx
- Fastly
- Syseleven
- Netbird
- VMware
- Stackit
- Harness
- Mia Platform
- and many, many more...
- TODO:

View File

@@ -4,6 +4,10 @@ title: Lessons Learned
weight: 8
---
Not related to any talk directly, but i can recommend this [Blog Post](https://smudge.ai/blog/ratelimit-algorithms) and [Video](https://www.youtube.com/watch?v=8QyygfIloMc&) about rate limiting.
## General concepts
TODO:
- Everyone still struggels with platforms and getting people to use them
## Tools/Projects
- I should really take a look at KRO again