Compare commits
10 Commits
a43ce599f7
...
7bb986a25f
Author | SHA1 | Date | |
---|---|---|---|
7bb986a25f | |||
622bb7dbc7 | |||
82a6e027cb | |||
84abb7e1b9 | |||
11e3866f01 | |||
af32775bab | |||
021ab45ec5 | |||
0a464e0dfd | |||
46f0fca196 | |||
d0a270d84e |
@ -1,6 +1,6 @@
|
||||
# @niggl/kubecon25
|
||||
# @niggl/cnsmunich25
|
||||
|
||||
My experiences at Cloud Native Rejekts and KubeCon + CloudNativeCon Europe 2025 in London.
|
||||
My experiences at Cloud Native Summit 2025 in Munich.
|
||||
|
||||
## Quickstart 🐳
|
||||
|
||||
|
@ -5,12 +5,17 @@ title: Cloud Native Summit Munich 2025
|
||||
|
||||
All about the things I did and sessions I attended at Cloud Native Summit 2025 in Munich.
|
||||
|
||||
This current version is probably full of typos - will fix later. This is what typing the notes blindly in real time get's you.
|
||||
This current v[text](.templates/talk.md)ersion is probably full of typos - might fix later (prbly won't tbh). This is what typing the notes blindly in real time get's you.
|
||||
|
||||
## How did I get there?
|
||||
|
||||
I attended Cloud Native Rejekts and KubeCon + CloudNativeCon Europe 2025 in London and some of the atendees reccomended checking out CNS Munich for another event in the same spirit as Cloud Native Rejekts.
|
||||
After a short talk with my boss, I there by my employer [DATEV eG](https://datev.de) alongside two of my coworkers.
|
||||
After a short talk with my boss, I got sent there by my employer [DATEV eG](https://datev.de) alongside two of my coworkers.
|
||||
|
||||
## And how was it.
|
||||
|
||||
I'd say that attending CNS Munich 2025 was worth it. The event is pretty close to my place of employment (2hrs by car or train) and relatively small in size (400 attendees). The talks varied a bit - the first day had a bunch of interesting talks but the second day indulged in ai-related talks (and they were not quite my cup of tea). This might me fine for others but I've heard enogh about ai use cases for the coming years at the last events i attended (or just reddit).
|
||||
Maybe disributing the ai-talks over the two days - while always providing an interesting alternative - might be the right move for next time.
|
||||
|
||||
## And how does this website get it's content
|
||||
|
||||
|
78
content/day1/07_devex.md
Normal file
78
content/day1/07_devex.md
Normal file
@ -0,0 +1,78 @@
|
||||
---
|
||||
title: What going cloud native taught us about developer experience
|
||||
weight: 7
|
||||
tags:
|
||||
- devex
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## History/Base on-prem
|
||||
|
||||
- Monolith
|
||||
- High autonomy regarding releases into prod (auto gen)
|
||||
- Comfort features -> It just works™
|
||||
|
||||
## Goals
|
||||
|
||||
- Microservices
|
||||
- Accelerated processes
|
||||
- Better dev experiences
|
||||
|
||||
## The road
|
||||
|
||||
### Expectations
|
||||
|
||||
- Expected new work for developers: CI/CD, GitOps, Monitoring, Security, Resilience, Connect to other services
|
||||
- New for developers: Kubernetes with a bunch of surrounding tech
|
||||
|
||||
### Journey
|
||||
|
||||
1. Start of jurney: Usually a transition towards the cloud with some kubernetes and deployment templates and less legacy stuff
|
||||
2. Result:
|
||||
- Implementation: Gigantic base manifests with maybe sume overlay abstraction (kustomize or helm)
|
||||
- Expectation: Works in all environments
|
||||
3. Considerations:
|
||||
- Developers will change config (its only a question of when and not if)
|
||||
- The migration from an env file to kubernetes compliant yaml can be a hard one
|
||||
4. Iteration: The developer friendly config is our new goal
|
||||
|
||||
### The developer friendly config
|
||||
|
||||
> e.g. in a helm values file
|
||||
|
||||
- Easy to understand and configure
|
||||
- Think about the dev experience (sane defaults)
|
||||
- Allow templating
|
||||
- Provide documentation
|
||||
|
||||
## Developer centric approach to cloud native
|
||||
|
||||
> There are not many technical problems in cloud native, most are experience related
|
||||
|
||||
### Remember
|
||||
|
||||
- Your users won't react the way you expect them to
|
||||
- The plattform sould serve the need of your users, not the other way around
|
||||
- Users will come to you, if you build a nice environment for them
|
||||
|
||||
### What do your users need
|
||||
|
||||
- Every service is like it's own area -> What connections does it need to the outside and how do i ensure it's health
|
||||
- Reduced cognitive load: Avoid developers being occupied with foundational work instead of delivering value
|
||||
- The new env needs to be as nice or nicer than the old env
|
||||
|
||||
### How to help them
|
||||
|
||||
- Bootstrapping: Define blueprints (including dependency services line databases) and automate stuff with defined goals (my service should be deployed 5 mins after bootstrapping)
|
||||
- Tooling: Backstage (yay)
|
||||
- Establish training programms and communities
|
||||
|
||||
## Wrap up
|
||||
|
||||
- Interfaces are important
|
||||
- Ensure roads to your service are well maintained and documented
|
||||
- Build standards and contracts to ensure that others can rely on your service
|
||||
- Build a example project that is not too big, but tackels real-world challenges
|
||||
- Remember that developers are used to the old way of working which has a bunch of creature comforts
|
83
content/day1/08_auth.md
Normal file
83
content/day1/08_auth.md
Normal file
@ -0,0 +1,83 @@
|
||||
---
|
||||
title: How Google Built a Consistent, Global, Authorization System with Zansibar and you can too
|
||||
weight: 8
|
||||
tags:
|
||||
- auth
|
||||
- security
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
Challenge: You send an mail via gmail that has a google drive attachment -> Those are two seperate apps but a central auth check needs to take place to provide access to the recipient.
|
||||
|
||||
## Access controll types
|
||||
|
||||
- ACL (access control list): Pretty basic
|
||||
- RBAC: The defacto standard for a long time
|
||||
- ABAC (Attribute based access controll): Check attributes (user-id, ip address, ...) on access time to make a decision
|
||||
- ReBAC (Relationship based access controll)
|
||||
|
||||
## ReBAC
|
||||
|
||||
### Baseline
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
document-->|Is part of|folder-->|was created by|user
|
||||
```
|
||||
|
||||
### Relation Tuple
|
||||
|
||||
- `document:123#owner@user:3` -> User 3 is owhner of document 123
|
||||
- `groud:engineering#membner@group:security` -> Group security is a member of the group engineering
|
||||
|
||||
### Graph representation (DAG)
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
somedocument-->reader
|
||||
somedocument-->writer
|
||||
reader-.->|is also available via|writer
|
||||
reader-->UserA
|
||||
reader-->UserB
|
||||
writer-->UserC
|
||||
writer-->UserD
|
||||
```
|
||||
|
||||
And check if there is a unidirectional way from somedocument to UserA over writer -> No = No access
|
||||
|
||||
## Zansibar
|
||||
|
||||
- Globaly distributed
|
||||
- ReBAC based
|
||||
- Zentral API
|
||||
|
||||
### Hotspots
|
||||
|
||||
- Problem: Some checks need to happen often
|
||||
- Solution: Distributed caching
|
||||
- Cache validity: Time stamp optimization by rounding to a second or 50ms
|
||||
- Improvement: Internal use of grpc
|
||||
- Lock table: If the same query get's executed multiple times at once, calculate query once and return cached response to all waiting queries
|
||||
- Improve cache population: Don't kill sub-checks instantly but delayed
|
||||
|
||||
### Zookies
|
||||
|
||||
- Specify a specific point in time (e.g. to bypass cache with "give me the latest")
|
||||
- Allows control over the latency vs real-time trade-off
|
||||
- Solves the new enemy problem: You loose access at the same time it get's changed -> may result in phantom access to the new version if cached data get's used
|
||||
|
||||
### Implementations
|
||||
|
||||
> Some of the popular oppen source implementations, just for later
|
||||
|
||||
- SpiceDB
|
||||
- ORY
|
||||
- Permify
|
||||
|
||||
### Pro
|
||||
|
||||
- Low latency with high throughput
|
||||
- Global consistency
|
||||
- Composable and hierarchical permission models
|
58
content/day1/09_confidential.md
Normal file
58
content/day1/09_confidential.md
Normal file
@ -0,0 +1,58 @@
|
||||
---
|
||||
title: Building a Confidential AI Inference Platform on Kubernetes
|
||||
weight: 9
|
||||
tags:
|
||||
- security
|
||||
- ai
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
> Felt a bit like a showcase of their product's architecture - not bad, just nothing really to take home
|
||||
|
||||
Backgrund: How do we protect the data flowing into and out of our ai models?
|
||||
|
||||
## Goals
|
||||
|
||||
- Cloud based interference api
|
||||
- E2E Encryption
|
||||
- E2E Attestation
|
||||
|
||||
## Encryption Mechanisms
|
||||
|
||||
- Idea: Combine data at rest with data in transit and data in use encryption (encrypted memory)
|
||||
- Attestation: CPU has a private key and issues certificates
|
||||
|
||||
## Confidential Containers
|
||||
|
||||
- Traditional: Full VM-based isolation
|
||||
- Kubernetes: Advanced contaoiner isolation using virtual sockets and much more
|
||||
- Implementation: Frameworks like contrast
|
||||
|
||||
### Threat model
|
||||
|
||||
- Isolated: Container
|
||||
- Shared: Kubernetes, Hypervisor, Cloud Infra, Hardware
|
||||
|
||||
### Architecture
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
User
|
||||
User-->|Accesses with trust|AICode
|
||||
User-->|Key exchange|SecretService-->|Key exchange|AICode
|
||||
Manifest-->|Configure|ContrastCoordinator
|
||||
subgraph Cluster
|
||||
ContrastCoordinator(Contrast Coordinator)
|
||||
ContrastCoordinator-->|Verify|Worker
|
||||
subgraph Worker
|
||||
AICode(AI Code)
|
||||
AttestationAgent
|
||||
end
|
||||
AICode-->|Accesses|GPU
|
||||
AttestationAgent-->|Verify|GPU
|
||||
SecretService
|
||||
end
|
||||
ContrastCoordinator-->|Attest|User
|
||||
```
|
51
content/day1/10_observability.md
Normal file
51
content/day1/10_observability.md
Normal file
@ -0,0 +1,51 @@
|
||||
---
|
||||
title: "Think Big: Monitoring Stack was yesterday - Observability Platform at scale!"
|
||||
weight: 10
|
||||
tags:
|
||||
- monitoring
|
||||
- observability
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Where do you start with monitoring
|
||||
|
||||
- The cloud standard solution: Prometheus
|
||||
- But: What if we don't just monitor one app but a cluster or muiltiple clusters?
|
||||
- Problem: Prometheus isn't quite the best when it comes to scaling
|
||||
- And: We want Dashboards, Traces, Alerting, Logs, Auditing, ...
|
||||
|
||||
## Trying to build the master monitoring by just adding stuff on the side
|
||||
|
||||
- Add custom stuff
|
||||
- More complex setups
|
||||
- Less and less documentation and standardization
|
||||
|
||||
## But how do we regain controll
|
||||
|
||||
- Product Thinking: Let's collect the problems
|
||||
- Result: No clear seperation of the product, no vision (just firefighting), We want better releases and improve resource usage
|
||||
|
||||
### Transition
|
||||
|
||||
1. Overview of the current stack -> Just list all components -> We're no longer just a monitoring stack, but we do overvability
|
||||
2.
|
||||
1. Long term goals and vision -> Add clear interfaces and contracts (hey platform mindset, we've heard that one before) based on expectations
|
||||
2. Target groups and journeys -> Clear reponsibility cut-off between platfrom<->users
|
||||
3. Improve the plattform -> Needs full buy in to be the **central**, **open** and **selfservice** platform
|
||||
- In their case: Focus on Mimir instead of prometheus and alloy but keep grafana and loki
|
||||
- Define everything else as out of scope (for now)
|
||||
- Expand scope by improving the experience instead of just "adding tools"
|
||||
|
||||
## Pillars of Observability
|
||||
|
||||
- Data management: Ingest, Query
|
||||
- Dashboard Management: Create, Update, Export
|
||||
- Alert Management: Rules, Routing, Analytics, Silence
|
||||
|
||||
## Wrap up
|
||||
|
||||
- Do i need monitoring or more (both is fine)?
|
||||
- Identify the target audience and their journey (not jsut the tools they want to use)
|
||||
- Improve the experience and say no if a user requests something that would not improve it
|
@ -10,4 +10,5 @@ The first day started with the usual organizational topics (schedule, sponsors a
|
||||
|
||||
- For everyone: [IT-Grundschutz trifft Kubernetes: Praxisnahe Umsetzung sicherheitsrelevanter Anforderungen](./03_grundschutz)(it was presented in an engaging way)
|
||||
- If you're interested in metal³: [Bringing Cloud-Native Agility to Bare-Metal Kubernetes with Cluster API and Metal³](./05_baremetal)
|
||||
- DevEx: [What going cloud native taught us about developer experience](./07_devex) (and honestly worth the speaker's accent and city skylines metaphor)
|
||||
- DevEx: [What going cloud native taught us about developer experience](./07_devex) (and honestly worth the speaker's accent and city skylines metaphor)
|
||||
- If you're interested in different access control patterns: [How Google Built a Consistent, Global, Authorization System with Zansibar and you can too](./08_auth)
|
||||
|
48
content/day2/01_wasm-vm.md
Normal file
48
content/day2/01_wasm-vm.md
Normal file
@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "Beyond MicroserVices: Running VMS, WASM, and AI WOrkloads on Kubernetes"
|
||||
weight: 1
|
||||
tags:
|
||||
- wasm
|
||||
- vm
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
This is more of an "overview" talk and less actual new knowledge or specialized stuff.
|
||||
|
||||
## Baseline
|
||||
|
||||
We all know
|
||||
|
||||
- Deployments
|
||||
- Statefulsets
|
||||
- Functions
|
||||
- and so on
|
||||
|
||||
## Strange new World: VMs on Kubernetes
|
||||
|
||||
- Why VM? Legacy! (and VDI and some testing envs)
|
||||
- The cool thing: VMs are basicly Pods with virtualization powered by kvm/qemu/libvirt
|
||||
- Demo: Kubernetes on GCP with KubeVirt installed and deployment of a vm with guest tool access
|
||||
- TL;DR: Kubevirt makes the vm management ux pretty good
|
||||
|
||||
TODO: Steal vm vs container vs kubevirt layers illustration
|
||||
|
||||
## Kind of a different universe: WASM
|
||||
|
||||
- WASM: Low level typed intermediate machine code
|
||||
- WASI: System Interface for externel functions (fs, network, ...)
|
||||
- Pro: Secure, Portable and performant
|
||||
- Con: Bleeding-edge, complex, and not feature-complete
|
||||
|
||||
### Now on kubernetes (with spinkube)
|
||||
|
||||
- Still executed on a node with a pod, but this pod does not contain a container, but a Spin which contains the service as a wasm container (via the containerd-wasm shim)
|
||||
- Up to 10x faster spin up than a traditional container
|
||||
|
||||
## And how about ai?
|
||||
|
||||
- Goal: Host yourself or at least in the EU
|
||||
- Simple quickstart: Ollama
|
||||
- Challenge: Cost planning
|
45
content/day2/02_agent.md
Normal file
45
content/day2/02_agent.md
Normal file
@ -0,0 +1,45 @@
|
||||
---
|
||||
title: "Works on my LLM: Building your own ai code assistant that isn't completely useless"
|
||||
weight: 2
|
||||
tags:
|
||||
- ai
|
||||
- vibecoding
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
Build or improvde your own ai coding agent (well mostly improve).
|
||||
|
||||
## Baseline
|
||||
|
||||
- AI enables us to produce usless code 10x faster
|
||||
- Problem: Traditional vibe coding is just a short instruction "build me a web app"
|
||||
- Solution: Context Engineering to support the next step with the right information
|
||||
- Agent has Multiple Parts: LLM, Context Window, External Context (files), MCP
|
||||
|
||||
## Set up the bootloader
|
||||
|
||||
- Rule-File: Coding style, conventions, best practives -> "always do this"
|
||||
- Workflows: Helpers like scripts, etc
|
||||
- e.g.: Gather Requirements -> Clarify -> Create specification
|
||||
- Can be wirtten in normal english and maybe annotated using agent-specific tags
|
||||
|
||||
## Load domain specific knowledge
|
||||
|
||||
- Useful: Add questions regarding approach/architecture to your workflows
|
||||
- This is where mcp servers can come in
|
||||
- Challenge: Picking the right and right amount of information to provide to the agent
|
||||
|
||||
## Micro context strategy
|
||||
|
||||
- Problem: Monolythic context that can be filled up and even trunkated
|
||||
- Idea: Split into multiple smaller contexts that will be combined before sending to the ai
|
||||
- Implementation: Save the context into different files and chunk the results into files
|
||||
- Pro: Can be used for statless interaction
|
||||
|
||||
## State Managmeent
|
||||
|
||||
- Memory Bank: Always keep updated documents with summaries for the implementation task
|
||||
- The rabit hole problem: Trying workaround after workaround resulting in a full context with useless non-working workaround
|
||||
- Checkpoint Restoration: Create checkpoints and recreate contexts from them instead of trying to force the ai back on track
|
56
content/day2/03_k3s-gpu.md
Normal file
56
content/day2/03_k3s-gpu.md
Normal file
@ -0,0 +1,56 @@
|
||||
---
|
||||
title: "Brains on the edge - running ai workloads with k3s and gpu nodes"
|
||||
weight: 3
|
||||
tags:
|
||||
- ai
|
||||
- gpu
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
I decided not to note down the usual "typical challenges on the edge" slides (about 10 mins of the talk)
|
||||
|
||||
## Baseline
|
||||
|
||||
- Edge can be split up: Near Edge, Far Edge, Device Edge
|
||||
- They use k3s for all edge clusters
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Software: GPU Driver, Container Toolkit, Device Plugin
|
||||
- Hardware: NVIDIA GPU with a supported distro
|
||||
- Runtime: Not all runtimes support GPUs (containerd and CRI-O do)
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph Edge
|
||||
MQTT
|
||||
Kafka
|
||||
Analytics
|
||||
|
||||
MQTT-->|Publish collected sensor data|Kafka
|
||||
Kafka-->|Provide data to run|Analytics
|
||||
end
|
||||
subgraph Azure
|
||||
Storage
|
||||
Monitoring
|
||||
MLFlow
|
||||
|
||||
Storage-->|Provide long term analytics|MLFlow
|
||||
end
|
||||
|
||||
Analytics<-->|Sync models|MLFlow
|
||||
Kafka-->|Save to long term|Storage
|
||||
Monitoring-.->|Observe|Storage
|
||||
Monitoring-.->|Observe|MLFlow
|
||||
```
|
||||
|
||||
## Q&A
|
||||
|
||||
- Did you use the nvidia gpu operator: Yes
|
||||
- Which runtime did you use: ContainerD via K3S
|
||||
- Why k3s over k0s: Because we used it
|
||||
- Were you power limited: Nope, the edge was on a large ship
|
91
content/day2/04_many-cooks.md
Normal file
91
content/day2/04_many-cooks.md
Normal file
@ -0,0 +1,91 @@
|
||||
---
|
||||
title: "Many Cooks, One Platform: Balancing Ownership and Contribution for the Perfect Broth"
|
||||
weight: 4
|
||||
tags:
|
||||
- platform
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
{{% button href="https://docs.google.com/presentation/d/104LXd5-aPQIs4By6ftnyNWFhi4fqGLHZ4uVwpAE3RMo/mobilepresent?slide=id.g370dcb83b32_0_1" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
|
||||
|
||||
Not unlike a war story talk about trying to make a product work at a gov organization in the netherlands.
|
||||
|
||||
## Tech stack
|
||||
|
||||
- OpenShift migrating to a new WIP platform (WIP for 2 years)
|
||||
- 4 Clusters (2x DEV, 1xPROD, 1xMGMT)
|
||||
- Azure DevOps for application delivery
|
||||
|
||||
## Org
|
||||
|
||||
- IT Service Provider for the durch juduciary - mostly building webapps
|
||||
- Value Teams: Cross functional (Dev, Testing, Ops) with a focus on java and C#
|
||||
- Bases (aka Worstreams): Container Management Platform Base is the focus on this story
|
||||
|
||||
## Why platform?
|
||||
|
||||
### Where we came from
|
||||
|
||||
- The old days: Deployment Scripts
|
||||
- Evolution: Loosely coupled services like jenkins -> Loose interactions make for fun problems and diverging standards
|
||||
- The new hot stuff: Platform that solves the entire lifecycle
|
||||
|
||||
### The golden Path
|
||||
|
||||
subgraph IDP
|
||||
DeployService-->triggers|BuildService
|
||||
BuildService<-->|Interact with code|RepositoryService
|
||||
BuildService-->|pushes image to|Registry
|
||||
DeployService-->|deploy to|Prod
|
||||
end
|
||||
Prod
|
||||
|
||||
TODO: Steal image from slides
|
||||
|
||||
|
||||
### Bricks vs Builds
|
||||
|
||||
- Brick: Do it yourself
|
||||
- Build: Ready to use but needs diverse "implementations"
|
||||
|
||||
## Vision and scope
|
||||
|
||||
- Problem(scope): Not defined scope results in feature-wish creep
|
||||
- Problem(scope): Things being excluded that feel like they should be part of a platform -> You now have the pleasure of talkint to multiple departments
|
||||
- Old platform used an internal registry -> Business decided we want artifactory -> HA Artifactory costs as much as the rest of the platform
|
||||
- The company decides that builds now run in Azure DevOps
|
||||
- Problem(vision): It's easy to call a bunch of services "a platform" without actually integrating them with each other
|
||||
|
||||
## DevOps is an Antipattern
|
||||
|
||||
- Classic: Developers are seperated from Ops by a wall of confusion
|
||||
- Modern™: Just run it yourself! How? I'm not gonna tell you
|
||||
- Solution™: Add an Enabling Team between Dev and Platform
|
||||
- Problem: This usually results in creating more work for the platform team that has to support the enabling team in addition
|
||||
- Solution: The enabling team should be created out of both dev and ops people to create a deep understanding -> Just build a communiuty
|
||||
|
||||
## Comunity building
|
||||
|
||||
- Cornerstones: Conmsistency (same time, same community), Safe to ask questions (Vegas rule), Aknowledge both good and bad/rant, Follow up on discussions
|
||||
|
||||
### Real World Example: SRE Meetup
|
||||
|
||||
> Spoiler: This failed
|
||||
|
||||
- Every team was asked to send one SRE
|
||||
- Meeting tends to get cancelled one minute before due to "nothing to discuss"
|
||||
- Feels like the SREs have ideas or greviances and the platform team defends itself or attacks the asker of questions
|
||||
- Was replaced
|
||||
|
||||
### Real Workd Example: The Microservice Guild
|
||||
|
||||
- Contribution via Invitation: Hey I heard you built something cool, please tell us about it
|
||||
- Agenda always share in advance
|
||||
- Focus on solutions instead of offense/defense
|
||||
|
||||
## Summary
|
||||
|
||||
- A Platform is a colaborative effort
|
||||
- Scope has to be communicated early and often
|
||||
- Build a community
|
||||
- Sometimes you need to let things go if they don't work out
|
104
content/day2/05_kcp.md
Normal file
104
content/day2/05_kcp.md
Normal file
@ -0,0 +1,104 @@
|
||||
---
|
||||
title: Building a Platform Engineering API Layer with KCP
|
||||
weight: 5
|
||||
tags:
|
||||
- kcp
|
||||
- platform
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Baseline
|
||||
|
||||
- Platform is automated and self service
|
||||
- We always have a bunch of consumers and service providers that get connected via an internal db plattform
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph Consume
|
||||
A
|
||||
B
|
||||
end
|
||||
subgraph Provider
|
||||
Cert
|
||||
DB
|
||||
end
|
||||
IDP
|
||||
A-->|discover available services|IDP
|
||||
A-->|order db|IDP
|
||||
IDP-->|Notify|DB
|
||||
DB-->|fulfill|A
|
||||
```
|
||||
|
||||
## Why KubeAPI
|
||||
|
||||
- We have it all: Groups, Versions, Optionally Namespaced, ...
|
||||
- It is extendable via CRDs
|
||||
- Challenges: CRDs are Cluster Scoped -> Everyone shares them across Namespaces
|
||||
- Idea: Everyone get's their own cluster
|
||||
- Problem: Spinning up clusters is slow and resource intensive
|
||||
- Idea: "Lightweight clusters" aka Hosted Control Plane
|
||||
- Problem: Now we have to share CRDs Across Clusters
|
||||
|
||||
## WTF is KCP?
|
||||
|
||||
- Idea: What if we had seperate control planes but with a shared datastore
|
||||
- Goal: Horitontally scalable control plane for extenable APIs
|
||||
- You don't need Kubernetes to run KCP (it's a standalone binary)
|
||||
- It does not spin up a real api server but a workspace wit a low memory footprint
|
||||
- It does not implement all of the container related stuff (Pod, Deployment, ...)
|
||||
|
||||
### Access and setup
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
User-->|Create APIServer Team A|KCP
|
||||
KCP-->|Kubeconfig|User
|
||||
subgraph KCP
|
||||
APIA(Workspace Team A)
|
||||
APIB(Workspace Team B)
|
||||
Datastore
|
||||
end
|
||||
|
||||
User-->|Kubectl get ns|APIA
|
||||
APIA-->|Return NS for Workspace A|User
|
||||
```
|
||||
|
||||
### Internal Organization
|
||||
|
||||
- Workspaces are organized in a tree
|
||||
- Possibility of nested fun: `/clusters/root:org-a:team-a`
|
||||
- Sub-Workspaces can't access ressources from the root workspace
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Root
|
||||
Root-->OrgA
|
||||
Root-->OrgB
|
||||
OrgA-->TeamA
|
||||
```
|
||||
|
||||
### Sharing
|
||||
|
||||
- KCP owns all Workspaces -> It can share stuff across clusters
|
||||
- To share: APIExport (Can Share multiple CRDs in one Package)
|
||||
- To use: APIBinding (Just reference the Exported API By Path to Workspace and name)
|
||||
|
||||
### Order fulfillment
|
||||
|
||||
- Classic Kubernetes: Controller -> But they are isolated, aren't they?
|
||||
- Virtual Workspace: Provite a computed view of parts of a workspace -> Basicly a URL that you provide to the controller that can be used to watch objects accross workspaces
|
||||
- Part of KCPs magic -> You don't create it, but it get's managed for each APIExport
|
||||
|
||||
## Notes from the demo
|
||||
|
||||
- Spin up locally is near instant
|
||||
- Switching to the namespace can be achived with a simple api command or
|
||||
|
||||
## But why do we even need a universal API layer
|
||||
|
||||
- Service Providers should not be responsible to make things discoverable, the plattform should
|
||||
- The internal pülatform can be bought, customized or diyed but the api layer does not change -> Interchangeable backend switching
|
||||
- Kubernetes is already widespread and makes it easy to use different projects
|
||||
- Backed by the CNCF, flat learning curve
|
@ -4,3 +4,11 @@ title: Day 2
|
||||
weight: 2
|
||||
---
|
||||
|
||||
The schedule on day 2 was pretty ai platform focused.
|
||||
Sadly all of the ai focused talks were about building workflows and platforms with gitops and friends, not about actually building the base (gpus scheduling and so on).
|
||||
We also had some "normal" work tasks resulting in less talks visited and more "normal" work + networking.
|
||||
|
||||
## Reccomended talks
|
||||
|
||||
- Good speaker: [Many Cooks, One Platform: Balancing Ownership and Contribution for the Perfect Broth](./04_many-cooks)
|
||||
- Good intro to kcp: [Building a Platform Engineering API Layer with KCP](05_kcp)
|
@ -4,6 +4,7 @@ title: Lessons Learned
|
||||
weight: 3
|
||||
---
|
||||
|
||||
## Mal anschauen
|
||||
## Maybe look into
|
||||
|
||||
- Otterize für Netzwrok policies
|
||||
- Otterize für Netzwrok policies
|
||||
- Spinkube/Wasm Cloud for optimized wasm on kubernetes
|
Loading…
x
Reference in New Issue
Block a user