From 8b1edb32c317c522ad6b2912e6cbcd7cc8cd51c3 Mon Sep 17 00:00:00 2001
From: Nicolai Ort <info@nicolai-ort.com>
Date: Sat, 21 Mar 2026 12:52:23 +0100
Subject: [PATCH] docs(day-2): Added telemetry talk

---
 ...bility.md => 04_operator-estensibility.md} |  0
 content/day-2/05_selvimproving.md             | 68 +++++++++++++++++++
 content/day-2/_index.md                       |  3 +-
 3 files changed, 70 insertions(+), 1 deletion(-)
 rename content/day-2/{04_operator estensibility.md => 04_operator-estensibility.md} (100%)
 create mode 100644 content/day-2/05_selvimproving.md

diff --git a/content/day-2/04_operator estensibility.md b/content/day-2/04_operator-estensibility.md
similarity index 100%
rename from content/day-2/04_operator estensibility.md
rename to content/day-2/04_operator-estensibility.md
diff --git a/content/day-2/05_selvimproving.md b/content/day-2/05_selvimproving.md
new file mode 100644
index 0000000..3946526
--- /dev/null
+++ b/content/day-2/05_selvimproving.md
@@ -0,0 +1,68 @@
+---
+title: "The self-improving platform: Closing the Loop Between Telemetry and Tuning"
+weight: 5
+tags:
+ - rejekts
+ - telemetry
+---
+
+<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
+<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
+TODO: Copy repo link for samples
+
+The statistics of these talks are based on a survey including multiple companies, focused on ones that build and run applications
+
+## Baseline
+
+- Usually the golden path for devs only goes up to deploying their app, not day2/monitoring
+- Most platform teams just provide the metrics and basic dashboards but no alterts or key healthiness identifiers
+
+## Overvations regarding stakeholders
+
+- Stakeholders
+  - ~43% of companies have a dedicated platform team, the rest have a mixed team/shared efford
+  - only ~18% have a dedicated SRE team that couples application to platforms
+- Ownership: over 50% of companies ue a shared ownership model -> Not my problem
+- Priorities
+  - Product Team_: Ship features fast (a dollar spend on RND is worth more than one saved)
+  - SRE: Keep everything up (an hour of uptime is worth more that the cost of a buffer)
+  - FinOps: Reduce the bill (a dollar wasted is a dollar stolen from RND)
+- Conflict: Cost saving (FinOps) vs Satety (SRE) when it comes to overprovisioning
+- 75% of interviewees use kubernetes with over 50% using JVM as the runtime 
+
+## Pain points
+
+- Main focus: Cost vs performance
+- Side-note: Reloability
+- Result: We need a flexible path that can decern between
+  - User facing app: Performance first
+  - Critical app: Reliability first
+  - Non-critical apps: Reduce cost
+
+## Optimizatiomn
+
+
+- Tuning: Only 18% are tuning their container and runtime
+- We need a full stack approach:
+  - Don't just increase pod resources but also update things like the heap-size in your runtime
+  - Use HPA to sale if you already right-sized your pod+runtime
+  - Get to know your per node usage to improve node autoscaling
+
+## Building a continuus automation layer
+
+- Telemetry: Import Metrics
+- Analysis with tuning profiles (historic data) for optimizations
+- GitOps for automatic PR creation and previews
+- Sample Architecture:
+  - Import: OTEL Metric into Prometheus
+  - Visualize: Grafana
+  - Analyze: Cronjob that collects the last 30mins of metrics
+  - Optimize: Run the analyzed metrics against policies (like i want 20% headrooom for memory) that then act and create PRs (they did this through OPA)
+
+TODO: Steal image from slides
+
+## Wrap-up
+
+- Automated optimization with human in the loop to keep the experts in touch and enable fast but secure changes
+- Optimization should be an invisible platform capability (like renovate/dependabot for dependencies)
+- Optimization is a domino effect: The right foundations enable better future decisions
\ No newline at end of file
diff --git a/content/day-2/_index.md b/content/day-2/_index.md
index c75184f..3e9a5f1 100644
--- a/content/day-2/_index.md
+++ b/content/day-2/_index.md
@@ -16,7 +16,8 @@ I have to admit that I'm very bad with names and don't always regocnize people b
 
 ## Talk recommendations
 
-- If you're building operators: [Solving Operator Extensibility: A gRPC Plugin Framework for kubernetes](./04_operator%20estensibility)
+- If you're building operators: [Solving Operator Extensibility: A gRPC Plugin Framework for kubernetes](./04_operator-estensibility)
+- The idea behind [The self-improving platform: Closing the Loop Between Telemetry and Tuning](./05_selvimproving) is very interesting but the first half of the talk is kinda confusing as it discusses a study that could have been shortened drasticly. But the way they automaticly create PRs for resource utilizations is cool
 
 ## Other stuff I learned or people i talk to