Compare commits

...

5 Commits

Author SHA1 Message Date
Nicolai Ort f8e654d6a5
more typos 2024-03-26 15:21:30 +01:00
Nicolai Ort 9ee562e88d
Day 4 typos 2024-03-26 15:19:51 +01:00
Nicolai Ort daf83861af
Day 3 typos 2024-03-26 15:09:33 +01:00
Nicolai Ort 7b1203c7a3
Day 2 typos 2024-03-26 15:00:48 +01:00
Nicolai Ort e2e3b2fdf3
Day 1 typos 2024-03-26 14:39:44 +01:00
47 changed files with 535 additions and 413 deletions

119
.vscode/ltex.dictionary.en-US.txt vendored Normal file
View File

@ -0,0 +1,119 @@
CloudNativeCon
Syntasso
OpenTelemetry
Multitannancy
Multitenancy
PDBs
Buildpacks
buildpacks
Konveyor
GenAI
Kube
Kustomize
KServe
kube
InferenceServices
Replicafailure
etcd
RBAC
CRDs
CRs
GitOps
CnPG
mTLS
WAL
AZs
DBs
kNative
Kaniko
Dupr
crossplane
DBaaS
APPaaS
CLUSTERaaS
OpsManager
multicluster
Statefulset
eBPF
Parca
KubeCon
FinOps
moondream
OLLAMA
LLVA
LLAVA
bokllava
NVLink
CUDA
Space-seperated
KAITO
Hugginface
LLMA
Alluxio
LLMs
onprem
Kube
Kubeflow
Ohly
distroless
init
Distroless
Buildkit
busybox
ECK
Kibana
Dedup
Crossplane
autoprovision
RBAC
Serviceaccount
CVEs
Podman
LinkerD
sidecarless
Kubeproxy
Daemonset
zTunnel
HBONE
Paketo
KORFI
Traefik
traefik
Vercel
Isovalent
CNIs
Ivanti
envs
CoreDNS
Istio
buildpacks
Buildpack
SBOM
Tekton
KPack
Multiarch
Tanzu
Kubebuilder
finalizer
OLM
depply
CatalogD
Rukoak
kapp
Depply
Jetstack
kube-lego
PKI-usecase
multimanager
kubebuider
kubebuilder
FluentD
FluentBit
OpenMetrics
upsert
tektone-based
ODIT.Services
Planetscale
vitess
Autupdate
KubeCon

3
.vscode/ltex.disabledRules.en-US.txt vendored Normal file
View File

@ -0,0 +1,3 @@
ARROWS
ARROWS
ARROWS

View File

@ -0,0 +1,2 @@
{"rule":"MORFOLOGIK_RULE_EN_US","sentence":"^\\QJust create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind)\nYou can also activate replication streaming\\E$"}
{"rule":"MORFOLOGIK_RULE_EN_US","sentence":"^\\QResulting needs\nCluster aaS (using crossplane - in this case using aws)\nDBaaS (using crossplane - again usig pq on aws)\nApp aaS\\E$"}

View File

@ -9,7 +9,7 @@ This current version is probably full of typos - will fix later. This is what ty
## How did I get there? ## How did I get there?
I attended KubeCon + CloudNAtiveCon Europe 2024 as the one and only [ODIT.Services](https://odit.services) representative. I attended KubeCon + CloudNativeCon Europe 2024 as the one and only [ODIT.Services](https://odit.services) representative.
## Style Guide ## Style Guide

View File

@ -7,4 +7,4 @@ tags:
--- ---
The first "event" of the day was - as always - the opening keynote. The first "event" of the day was - as always - the opening keynote.
Today presented by Redhat and Syntasso. Today presented by Red Hat and Syntasso.

View File

@ -6,34 +6,33 @@ tags:
- dx - dx
--- ---
By VMware (of all people) - kinda funny that they chose this title with the wole Broadcom fun. By VMware (of all people) - kinda funny that they chose this title with the whole Broadcom fun.
The main topic of this talk is: What interface do we choose for what capability. The main topic of this talk is: What interface do we choose for what capability.
## Personas ## Personas
* Experts: Kubernetes, DB Engee * Experts: Kubernetes, DB engineer
* Users: Employees that just want to do stuff * Users: Employees that just want to do stuff
* Platform Engeneers: Connect Users to Services by Experts * Platform engineers: Connect Users to Services by Experts
## Goal ## Goal
* Create Interfaces * Create Interfaces: Connect Users to Services
* Interface: Connect Users to Services * Problem: Many different types of Interfaces (SaaS, GUI, CLI) with different capabilities
* Problem: Many diferent types of Interfaces (SaaS, GUI, CLI) with different capabilities
## Dimensions ## Dimensions
> These are the dimensions of interface design proposed in the talk > These are the dimensions of interface design proposed in the talk
* Autonomy: external dependency (low) <-> self-service (high) * Autonomy: external dependency (low) <-> self-service (high)
* low: Ticket system -> But sometimes good for getting an expert * low: Ticket system -> But sometimes good for getting an expert
* high: Portal -> Nice, but somethimes we just need a human contact * high: Portal -> Nice, but sometimes we just need a human contact
* Contextual distance: stay in the same tool (low) <-> switch tools (high) * Contextual distance: stay in the same tool (low) <-> switch tools (high)
* low: IDE plugin -> High potential friction if stuff goes wrong/complex (context switch needed) * low: IDE plugin -> High potential friction if stuff goes wrong/complex (context switch needed)
* high: Wiki or ticketing system * high: Wiki or ticketing system
* Capability skill: anyone can do it (low) <-> Made for experts (high) * Capability skill: anyone can do it (low) <-> Made for experts (high)
* low: transparent sidecar (eg vuln scanner) * low: transparent sidecar (e.g. vulnerability scanner)
* high: cli * high: CLI
* Interface skill: anyone can do it (low) <-> needs specialized interface skills (high) * Interface skill: anyone can do it (low) <-> needs specialized interface skills (high)
* low: Documentation in web aka wiki-style * low: Documentation in web aka wiki-style
* high: Code templates (a sample helm values.yaml or raw terraform provider) * high: Code templates (a sample helm values.yaml or raw terraform provider)
@ -42,4 +41,4 @@ The main topic of this talk is: What interface do we choose for what capability.
* You can use multiple interfaces for one capability * You can use multiple interfaces for one capability
* APIs (proverbial pig) are the most important interface b/c it can provide the baseline for all other interfaces * APIs (proverbial pig) are the most important interface b/c it can provide the baseline for all other interfaces
* The beautification (lipstick) of the API through other interfaces makes uers happy * The beautification (lipstick) of the API through other interfaces makes users happy

View File

@ -62,10 +62,10 @@ Presented by the implementers at Thoughtworks (TW).
### Observability ### Observability
* Tool: Honeycomb * Tool: Honeycomb
* Metrics: Opentelemetry * Metrics: OpenTelemetry
* Operator reconcile steps are exposed as traces * Operator reconcile steps are exposed as traces
## Q&A ## Q&A
* Your teams are pretty autonomus -> What to do with more classic teams: Over a multi-year jurney every team settles on the ownership and selfservice approach * Your teams are pretty autonomous -> What to do with more classic teams: Over a multi-year journey every team settles on the ownership and self-service approach
* How to teams get access to stages: They just get temselves a stage namespace, attach to ingress and have fun (admission handles the rest) * How teams get access to stages: They just get themselves a stage namespace, attach to ingress and have fun (admission handles the rest)

View File

@ -17,6 +17,6 @@ No real value
## What do we need ## What do we need
* User documentation * User documentation
* Adoption & Patnership * Adoption & Partnership
* Platform as a Product * Platform as a Product
* Customer feedback * Customer feedback

View File

@ -10,7 +10,7 @@ tags:
- multicluster - multicluster
--- ---
Part of the Multitannancy Con presented by Adobe Part of the Multi-tenancy Con presented by Adobe
## Challenges ## Challenges
@ -22,24 +22,24 @@ Part of the Multitannancy Con presented by Adobe
* Azure in Base - AWS on the edge * Azure in Base - AWS on the edge
* Single Tenant Clusters (Simpler Governance) * Single Tenant Clusters (Simpler Governance)
* Responsibility is Shared between App and Platform (Monitoring, Ingress, etc) * Responsibility is Shared between App and Platform (Monitoring, Ingress, etc.)
* Problem: Huge manual investment and overprovisioning * Problem: Huge manual investment and over provisioning
* Result: Access Control to tenant Namespaces and Capacity Planning -> Pretty much a multi tenant cluster with one tenant per cluster * Result: Access Control to tenant Namespaces and Capacity Planning -> Pretty much a multi tenant cluster with one tenant per cluster
### Second Try - Microcluster ### Second Try - Micro Clusters
* One Cluster per Service * One Cluster per Service
### Third Try - Multitennancy ### Third Try - Multi-tenancy
* Use a bunch of components deployed by platform Team (Ingress, CD/CD, Monitoring, ...) * Use a bunch of components deployed by platform Team (Ingress, CD/CD, Monitoring, ...)
* Harmonized general Runtime (cloud agnostic): Codenamed Ethos -> OVer 300 Clusters * Harmonized general Runtime (cloud-agnostic): Code-named Ethos -> Over 300 Clusters
* Both shared clusters (shared by namespace) and dedicated clusters * Both shared clusters (shared by namespace) and dedicated clusters
* Cluster config is a basic json with name, capacity, teams * Cluster config is a basic JSON with name, capacity, teams
* Capacity Managment get's Monitored using Prometheus * Capacity Management gets Monitored using Prometheus
* Cluster Changes should be non-desruptive -> K8S-Shredder * Cluster Changes should be nondestructive -> K8S-Shredder
* Cost efficiency: Use good PDBs and livelyness/readyness Probes alongside ressource requests and limits * Cost efficiency: Use good PDBs and liveliness/readiness Probes alongside resource requests and limits
## Conclusion ## Conclusion
* There is a balance between cost, customization, setup and security between single-tenant und multi-tenant * There is a balance between cost, customization, setup and security between single-tenant and multi-tenant

View File

@ -3,42 +3,41 @@ title: Lightning talks
weight: 6 weight: 6
--- ---
The lightning talks are 10-minute talks by diferent cncf projects. The lightning talks are 10-minute talks by different CNCF projects.
## Building contaienrs at scale using buildpacks ## Building containers at scale using buildpacks
A Project lightning talk by heroku and the cncf buildpacks. A Project lightning talk by Heroku and the CNCF buildpacks.
### How and why buildpacks? ### How and why buildpacks?
* What: A simple way to build reproducible contaienr images * What: A simple way to build reproducible container images
* Why: Scale, Reuse, Rebase * Why: Scale, Reuse, Rebase: Buildpacks are structured as layers
* Rebase: Buildpacks are structured as layers
* Dependencies, app builds and the runtime are seperated -> Easy update * Dependencies, app builds and the runtime are seperated -> Easy update
* How: Use the PAck CLI `pack build <image>` `docker run <image>` * How: Use the Pack CLI `pack build <image>` `docker run <image>`
## Konveyor ## Konveyor
A Platform for migration of legacy apps to cloudnative platforms. A Platform for migration of legacy apps to cloud native platforms.
* Parts: Hub, Analysis (with langugage server), Assesment * Parts: Hub, Analysis (with language server), assessment
* Roadmap: Multi language support, GenAI, Asset Generation (e.g. Kube Deployments) * Roadmap: Multi language support, GenAI, Asset Generation (e.g. Kube Deployments)
## Argo'S Communuty Driven Development ## Argo's Community Driven Development
Pretty mutch a short intropduction to Argo Project Pretty much a short introduction to Argo Project
* Project Parts: Workflows (CI), Events, CD, Rollouts * Project Parts: Workflows (CI), Events, CD, Rollouts
* NPS: Net Promoter Score (How likely are you to recoomend this) -> Everyone loves argo (based on their survey) * NPS: Net Promoter Score (How likely are you to recommend this) -> Everyone loves Argo (based on their survey)
* Rollouts: Can be based with prometheus metrics * Rollouts: Can be based with Prometheus metrics
## Flux ## Flux
* Components: Helm, Kustomize, Terrafrorm, ... * Components: Helm, Kustomize, Terraform, ...
* Flagger Now supports gateway api, prometheus, datadog and more * Flagger Now supports gateway API, Prometheus, Datadog and more
* New Releases * New Releases
## A quick logg at the TAG App-Delivery ## A quick look at the TAG App-Delivery
* Mission: Everything related to cloud-native application delivery * Mission: Everything related to cloud-native application delivery
* Bi-Weekly Meetings * Bi-Weekly Meetings

View File

@ -8,30 +8,30 @@ tags:
- dx - dx
--- ---
This talks looks at bootstrapping Platforms using KSere. This talk looks at bootstrapping Platforms using KServe.
They do this in regards to AI Workflows. They do this in regard to AI Workflows.
## Szenario ## Scenario
* Deploy AI Workloads - Sometime consiting of different parts * Deploy AI Workloads - Sometime consisting of different parts
* Models get stored in a model registry * Models get stored in a model registry
## Baseline ## Baseline
* Consistent APIs throughout the platform * Consistent APIs throughout the platform
* Not the kube api directly b/c: * Not the kube API directly b/c:
* Data scientists are a bit overpowered by the kube api * Data scientists are a bit overpowered by the kube API
* Not only Kubernetes (also monitoring tools, feedback tools, etc) * Not only Kubernetes (also monitoring tools, feedback tools, etc.)
* Better debugging experience for specific workloads * Better debugging experience for specific workloads
## The debugging api ## The debugging API
* Specific API with enhanced statuses and consistent UX across Code and UI * Specific API with enhanced statuses and consistent UX across Code and UI
* Exampüle Endpoints: Pods, Deployments, InferenceServices * Example Endpoints: Pods, Deployments, InferenceServices
* Provides a status summary-> Consistent health info across all related ressources * Provides a status summary-> Consistent health info across all related resources
* Example: Deployments have progress/availability, Pods have phases, Containers have readyness -> What do we interpret how? * Example: Deployments have progress/availability, Pods have phases, Containers have readiness -> What do we interpret how?
* Evaluation: Progressing, Available Count vs Readyness, Replicafailure, Pod Phase, Container Readyness * Evaluation: Progressing, Available Count vs Readiness, Replicafailure, Pod Phase, Container Readiness
* The rules themselfes may be pretty complex, but - since the user doesn't have to check them themselves - the status is simple * The rules themselves may be pretty complex, but - since the user doesn't have to check them themselves - the status is simple
### Debugging Metrics ### Debugging Metrics
@ -47,15 +47,15 @@ They do this in regards to AI Workflows.
* Kine is used to replace/extend etcd with the relational dock db -> Relation namespace<->manifests is stored here and RBAC can be used * Kine is used to replace/extend etcd with the relational dock db -> Relation namespace<->manifests is stored here and RBAC can be used
* Launchpad: Select Namespace and check resource (fuel) availability/utilization * Launchpad: Select Namespace and check resource (fuel) availability/utilization
### Clsuter maintainance ### Cluster maintenance
* Deplyoments can be launched to multiple clusters (even two clusters at once) -> HA through identical clusters * Deployments can be launched to multiple clusters (even two clusters at once) -> HA through identical clusters
* The excact same manifests get deployed to two clusters * The exact same manifests get deployed to two clusters
* Cluster desired state is stored externally to enable effortless upogrades, rescale, etc * Cluster desired state is stored externally to enable effortless upgrades, rescale, etc
### Versioning API ### Versioning API
* Basicly the dock DB * Basically the dock DB
* CRDs are the representations of the inference manifests * CRDs are the representations of the inference manifests
* Rollbacks, Promotion and History is managed via the CRs * Rollbacks, Promotion and History is managed via the CRs
* Why not GitOps: Internal Diffs, deployment overrides, customized features * Why not GitOps: Internal Diffs, deployment overrides, customized features

View File

@ -7,25 +7,25 @@ tags:
- db - db
--- ---
A short Talk as Part of the DOK day - presendet by the VP of CloudNative at EDB (one of the biggest PG contributors) A short Talk as Part of the Data on Kubernetes day - presented by the VP of Cloud Native at EDB (one of the biggest PG contributors)
Stated target: Make the world your single point of failure Stated target: Make the world your single point of failure
## Proposal ## Proposal
* Get rid of Vendor-Lockin using the oss projects PG, K8S and CnPG * Get rid of Vendor-Lockin using the OSS projects PG, K8S and CnPG
* PG was the DB of the year 2023 and a bunch of other times in the past * PG was the DB of the year 2023 and a bunch of other times in the past
* CnPG is a Level 5 mature operator * CnPG is a Level 5 mature operator
## 4 Pillars ## 4 Pillars
* Seamless KubeAPI Integration (Operator PAttern) * Seamless Kube API Integration (Operator Pattern)
* Advanced observability (Prometheus Exporter, JSON logging) * Advanced observability (Prometheus Exporter, JSON logging)
* Declarative Config (Deploy, Scale, Maintain) * Declarative Config (Deploy, Scale, Maintain)
* Secure by default (Robust contaienrs, mTLS, and so on) * Secure by default (Robust containers, mTLS, and so on)
## Clusters ## Clusters
* Basic Ressource that defines name, instances, snyc and storage (and other params that have same defaults) * Basic Resource that defines name, instances, sync and storage (and other parameters that have same defaults)
* Implementation: Operator creates: * Implementation: Operator creates:
* The volumes (PG_Data, WAL (Write ahead log) * The volumes (PG_Data, WAL (Write ahead log)
* Primary and Read-Write Service * Primary and Read-Write Service
@ -35,15 +35,15 @@ Stated target: Make the world your single point of failure
* Failure detected * Failure detected
* Stop R/W Service * Stop R/W Service
* Promote Replica * Promote Replica
* Activat R/W Service * Activate R/W Service
* Kill old promary and demote to replica * Kill old primary and demote to replica
## Backup/Recovery ## Backup/Recovery
* Continuos Backup: Write Ahead Log Backup to object store * Continuous Backup: Write Ahead Log Backup to object store
* Physical: Create from primary or standby to object store or kube volumes * Physical: Create from primary or standby to object store or kube volumes
* Recovery: Copy full backup and apply WAL until target (last transactio or specific timestamp) is reached * Recovery: Copy full backup and apply WAL until target (last transaction or specific timestamp) is reached
* Replica Cluster: Basicly recreates a new cluster to a full recovery but keeps the cluster in Read-Only Replica Mode * Replica Cluster: Basically recreates a new cluster to a full recovery but keeps the cluster in Read-Only Replica Mode
* Planned: Backup Plugin Interface * Planned: Backup Plugin Interface
## Multi-Cluster ## Multi-Cluster
@ -51,21 +51,21 @@ Stated target: Make the world your single point of failure
* Just create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind) * Just create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind)
* You can also activate replication streaming * You can also activate replication streaming
## Reccomended architecutre ## Recommended architecture
* Dev Cluster: 1 Instance without PDB and with Continuos backup * Dev Cluster: 1 Instance without PDB and with Continuous backup
* Prod: 3 Nodes with automatic failover and continuos backups * Prod: 3 Nodes with automatic failover and continuous backups
* Symmetric: Two clusters * Symmetric: Two clusters
* Primary: 3-Node Cluster * Primary: 3-Node Cluster
* Secondary: WAL-Based 3-Node Cluster with a designated primary (to take over if primary cluster fails) * Secondary: WAL based 3-Node Cluster with a designated primary (to take over if primary cluster fails)
* Symmetric Streaming: Same as Secondary, but you manually enable the streaming api for live replication * Symmetric Streaming: Same as Secondary, but you manually enable the streaming API for live replication
* Cascading Replication: Scale Symmetric to more clusters * Cascading Replication: Scale Symmetric to more clusters
* Single availability zone: Well, do your best to spread to nodes and aspire to streched kubernetes to more AZs * Single availability zone: Well, do your best to spread to nodes and aspire to stretched Kubernetes to more AZs
## Roadmap ## Roadmap
* Replica Cluster (Symmetric) Switchover * Replica Cluster (Symmetric) Switchover
* Synchronous Symmetric * Synchronous Symmetric
* 3rd PArty Plugins * 3rd Party Plugins
* Manage DBs via the Operator * Manage DBs via the Operator
* Storage Autoscaling * Storage Autoscaling

View File

@ -4,14 +4,14 @@ weight: 9
--- ---
> When I say serverless I don't mean lambda - I mean serverless > When I say serverless I don't mean lambda - I mean serverless
> That is thousands of lines of yaml - but I don't want to depress you > That is thousands of lines of YAML - but I don't want to depress you
> It will be eventually done > It will be eventually done
> Imagine this error is not happening > Imagine this error is not happening
> Just imagine how I did this last night > Just imagine how I did this last night
## Goal ## Goal
* Take my sourcecode and run it, scale it - jsut don't ask me * Take my source code and run it, scale it - just don't ask me
## Baseline ## Baseline
@ -20,9 +20,9 @@ weight: 9
* Use Kaniko/Shipwright for building * Use Kaniko/Shipwright for building
* Use Dupr for inter-service Communication * Use Dupr for inter-service Communication
## Openfunction ## Open function
> The glue between different tools to achive serverless > The glue between different tools to achieve serverless
* CRD that describes: * CRD that describes:
* Build this image and push it to the registry * Build this image and push it to the registry
@ -35,8 +35,8 @@ weight: 9
* Open Questions * Open Questions
* Where are the serverless servers -> Cluster, dependencies, secrets * Where are the serverless servers -> Cluster, dependencies, secrets
* How do I create DBs, etc * How do I create DBs, etc.
* Resulting needs * Resulting needs
* Cluster aaS (using crossplane - in this case using aws) * CLUSTERaaS (using crossplane - in this case using AWS)
* DBaaS (using crossplane - again usig pq on aws) * DBaaS (using crossplane - again using pg on AWS)
* App aaS * APPaaS

View File

@ -14,21 +14,21 @@ Another talk as part of the Data On Kubernetes Day.
* Managed: Atlas * Managed: Atlas
* Semi: Cloud manager * Semi: Cloud manager
* Selfhosted: Enterprise and community operator * Self-hosted: Enterprise and community operator
### Mongo on K8s ### MongoDB on K8s
* Cluster Architecture * Cluster Architecture
* Control Plane: Operator * Control Plane: Operator
* Data Plane: MongoDB Server + Agen (Sidecar Proxy) * Data Plane: MongoDB Server + Agent (Sidecar Proxy)
* Enterprise Operator * Enterprise Operator
* Opsmanager CR: Deploys 3-node operator DB and OpsManager * OpsManager CR: Deploys 3-node operator DB and OpsManager
* MongoDB CR: The MongoDB cLusters (Compromised of agents) * MongoDB CR: The MongoDB clusters (Compromised of agents)
* Advanced Usecase: Data Platform with mongodb on demand * Advanced use case: Data Platform with MongoDB on demand
* Control Plane on one cluster (or on VMs/Hardmetal), data plane in tennant clusters * Control Plane on one cluster (or on VMs/Bare-metal), data plane in tenant clusters
* Result: MongoDB CR can not relate to OpsManager CR directly * Result: MongoDB CR can not relate to OpsManager CR directly
## Pitfalls ## Pitfalls
* Storage: Agnostic, Topology aware, configureable and resizeable (can't be done with statefulset) * Storage: Agnostic, Topology aware, configurable and resizable (can't be done with Statefulset)
* Networking: Cluster-internal (Pod to Pod/Service), External (Split horizon over multicluster) * Networking: Cluster-internal (Pod to Pod/Service), External (Split horizon over multicluster)

View File

@ -9,8 +9,8 @@ tags:
## CNCF Platform maturity model ## CNCF Platform maturity model
* Was donated to the cncf by syntasso * Was donated to the CNCF by Syntasso
* Constantly evolving since 1.0 in november 2023 * Constantly evolving since 1.0 in November 2023
### Overview ### Overview
@ -25,7 +25,7 @@ tags:
* Investment: How are funds/staff allocated to platform capabilities * Investment: How are funds/staff allocated to platform capabilities
* Adoption: How and why do users discover this platform * Adoption: How and why do users discover this platform
* Interfaces: How do users interact with and consume platform capabilities * Interfaces: How do users interact with and consume platform capabilities
* Operations: How are platforms and capabilities planned, prioritzed, developed and maintained * Operations: How are platforms and capabilities planned, prioritized, developed and maintained
* Measurement: What is the process for gathering and incorporating feedback/learning? * Measurement: What is the process for gathering and incorporating feedback/learning?
## Goals ## Goals
@ -34,24 +34,24 @@ tags:
* Outcomes & Practices * Outcomes & Practices
* Where are you at * Where are you at
* Limits & Opportunities * Limits & Opportunities
* Behaviours and outcome * Behaviors and outcome
* Balance People and processes * Balance People and processes
## Typical Journeys ## Typical Journeys
### Steps of the jurney ### Steps of the journey
1. What are your goals and limitations 1. What are your goals and limitations
2. What is my current landscape 2. What is my current landscape
3. Plan babysteps & iterate 3. Plan baby steps & iterate
### Szenarios ### Scenarios
* Bad: I want to improve my k8s platform * Bad: I want to improve my k8s platform
* Good: Scaling an enterprise COE (Center Of Excellence) * Good: Scaling an enterprise COE (Center Of Excellence)
* What: Onboard 20 Teams within 20 Months and enforce 8 security regulations * What: Onboard 20 Teams within 20 Months and enforce 8 security regulations
* Where: We have a dedicated team of centrally funded people * Where: We have a dedicated team of centrally funded people
* Lay the foundation: More funding for more larger teams -> Switch from Project to platform mindset * Lay the foundation: More funding for more, larger teams -> Switch from Project to platform mindset
* Do your technical Due diligence in parallel * Do your technical Due diligence in parallel
## Key Lessons ## Key Lessons
@ -60,8 +60,8 @@ tags:
* Know your landscape * Know your landscape
* Plan in baby steps and iterate * Plan in baby steps and iterate
* Lay the foundation for building the right thing and not just anything * Lay the foundation for building the right thing and not just anything
* Dont forget to do your technical dd in parallel * Don't forget to do your technical dd in parallel
## Conclusion ## Conclusion
* Majurity model is a helpful part but not the entire plan * Maturity model is a helpful part but not the entire plan

View File

@ -6,43 +6,43 @@ tags:
- network - network
--- ---
Held by Cilium regarding ebpf and hubble Held by Cilium regarding eBPF and Hubble
## eBPF ## eBPF
> Extend the capabilities of the kernel without requiring to change the kernel source code or load modules > Extend the capabilities of the kernel without requiring to change the kernel source code or load modules
* Benefits: Reduce performance overhead, gain deep visibility while being widely available * Benefits: Reduce performance overhead, gain deep visibility while being widely available
* Example Tools: Parca (Profiling), Cilium (Networking), Hubble (Opservability), Tetragon (Security) * Example Tools: Parca (Profiling), Cilium (Networking), Hubble (Observability), Tetragon (Security)
## Cilium ## Cilium
> Opensource Solution for network connectivity between workloads > Open source Solution for network connectivity between workloads
## Hubble ## Hubble
> Observability-Layer for cilium > Observability-Layer for cilium
### Featureset ### Feature set
* CLI: TCP-Dump on steroids + API Client * CLI: TCP-Dump on steroids + API Client
* UI: Graphical dependency and connectivity map * UI: Graphical dependency and connectivity map
* Prometheus + Grafana + Opentelemetry compatible * Prometheus + Grafana + OpenTelemetry compatible
* Metrics up to L7 * Metrics up to L7
### Where can it be used ### Where can it be used
* Service dependency with frequency * Service dependency with frequency
* Kinds of http calls * Kinds of HTTP calls
* Network Problems between L4 and L7 (including DNS) * Network Problems between L4 and L7 (including DNS)
* Application Monitoring through status codes and latency * Application Monitoring through status codes and latency
* Security-Related Network Blocks * Security-Related Network Blocks
* Services accessed from outside the cluser * Services accessed from outside the cluster
### Architecture ### Architecture
* Cilium Agent: Runs as the CNI für all Pods * Cilium Agent: Runs as the CNI for all Pods
* Server: Runs on each node and retrieves the ebpf from cilium * Server: Runs on each node and retrieves the eBPF from cilium
* Relay: Provide visibility throughout all nodes * Relay: Provide visibility throughout all nodes
## TL;DR ## TL;DR

View File

@ -7,10 +7,10 @@ weight: 1
Day one is the Day for co-located events aka CloudNativeCon. Day one is the Day for co-located events aka CloudNativeCon.
I spent most of the day attending the Platform Engineering Day - as one might have guessed it's all about platform engineering. I spent most of the day attending the Platform Engineering Day - as one might have guessed it's all about platform engineering.
Everything started with badge pickup - a very smooth experence (but that may be related to me showing up an hour or so too early). Everything started with badge pickup - a very smooth experience (but that may be related to me showing up an hour or so too early).
## Talk reccomandations ## Talk recommendations
* Beyond Platform Thinking... * Beyond Platform Thinking...
* Hitchhikers Guide to ... * Hitchhiker's Guide to ...
* To K8S and beyond... * To K8S and beyond...

View File

@ -6,18 +6,18 @@ tags:
- opening - opening
--- ---
The opening keynote started - as is the tradition with keynotes - with an "motivational" opening video. The opening keynote started - as is the tradition with keynotes - with a "motivational" opening video.
The keynote itself was presented by the CEO of the CNCF. The keynote itself was presented by the CEO of the CNCF.
## The numbers ## The numbers
* Over 2000 attendees * Over 12000 attendees
* 10 Years of Kubernetes * 10 Years of Kubernetes
* 60% of large organizations expect rapid cost increases due to AI/ML (FinOps Survey) * 60% of large organizations expect rapid cost increases due to AI/ML (FinOps Survey)
## The highlights ## The highlights
* Everyone uses cloudnative * Everyone uses cloud native
* AI uses Kubernetes b/c the UX is way better than classic tools * AI uses Kubernetes b/c the UX is way better than classic tools
* Especially when transferring from dev to prod * Especially when transferring from dev to prod
* We need standardization * We need standardization
@ -26,10 +26,10 @@ The keynote itself was presented by the CEO of the CNCF.
## Live demo ## Live demo
* KIND cluster on desktop * KIND cluster on desktop
* Protptype Stack (develop on client) * Prototype Stack (develop on client)
* Kubernetes with the LLM * Kubernetes with the LLM
* Host with LLVA (image describe model), moondream and OLLAMA (the model manager/registry() * Host with LLAVA (image describe model), moondream and OLLAMA (the model manager/registry()
* Prod Stack (All in kube) * Prod Stack (All in kube)
* Kubernetes with LLM, LLVA, OLLAMA, moondream * Kubernetes with LLM, LLVA, OLLAMA, moondream
* Available Models: llava, mistral bokllava (llava*mistral) * Available Models: LLAVA, mistral bokllava (LLAVA*mistral)
* Host takes picture, ai describes what is pictures (in our case the conference audience) * Host takes picture, AI describes what is pictures (in our case the conference audience)

View File

@ -7,7 +7,7 @@ tags:
- panel - panel
--- ---
A podium discussion (somewhat scripted) lead by Pryanka A podium discussion (somewhat scripted) lead by Priyanka
## Guests ## Guests
@ -17,24 +17,24 @@ A podium discussion (somewhat scripted) lead by Pryanka
## Discussion ## Discussion
* What do you use as the base of dev for ollama * What do you use as the base of dev for OLLAMA
* Jeff: The concepts from docker, git, kubernetes * Jeff: The concepts from docker, git, Kubernetes
* How is the balance between ai engi and ai ops * How is the balance between AI engineer and AI ops
* Jeff: The classic dev vs ops devide, many ML-Engi don't think about * Jeff: The classic dev vs ops divide, many ML-Engineer don't think about
* Paige: Yessir * Paige: Yessir
* How does infra keep up with the fast research * How does infra keep up with the fast research
* Paige: Well, they don't - but they do their best and Cloudnative is cool * Paige: Well, they don't - but they do their best and Cloud native is cool
* Jeff: Well we're not google, but kubernetes is the saviour * Jeff: Well we're not google, but Kubernetes is the savior
* What are scaling constraints * What are scaling constraints
* Jeff: Currently sizing of models is still in it's infancy * Jeff: Currently sizing of models is still in its infancy
* Jeff: There will be more specific hardware and someone will have to support it * Jeff: There will be more specific hardware and someone will have to support it
* Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization) * Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization)
* Paige: Optimization of smaller models * Paige: Optimization of smaller models
* What technologies need to be open source licensed * What technologies need to be open source licensed
* Jeff: The model b/c access and trust * Jeff: The model b/c access and trust
* Tim: The models and base execution environemtn -> Vendor agnosticism * Tim: The models and base execution environment -> Vendor agnosticism
* Paige: Yes and remixes are really imporant for development * Paige: Yes and remixes are really important for development
* Anything else * Anything else
* Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world * Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world
* Paige: Currently many people just use paid apis to abstract the infra, but we need this stuff selfhostable * Paige: Currently many people just use paid APIs to abstract the infra, but we need this stuff self-hostable
* Tim: I don'T want to know about the hardware, the whole infra side should be done by the cloudnative teams to let ML-Engi to just be ML-Engine * Tim: I don't want to know about the hardware, the whole infra side should be done by the cloud native teams to let ML-Engineer to just be ML-Engine

View File

@ -9,7 +9,7 @@ tags:
Kevin and Sanjay from NVIDIA Kevin and Sanjay from NVIDIA
## Enabeling GPUs in Kubernetes today ## Enabling GPUs in Kubernetes today
* Host level components: Toolkit, drivers * Host level components: Toolkit, drivers
* Kubernetes components: Device plugin, feature discovery, node selector * Kubernetes components: Device plugin, feature discovery, node selector
@ -18,24 +18,24 @@ Kevin and Sanjay from NVIDIA
## GPU sharing ## GPU sharing
* Time slicing: Switch around by time * Time slicing: Switch around by time
* Multi Process Service: Run allways on the GPU but share (space-) * Multi Process Service: Always run on the GPU but share (space-)
* Multi Instance GPU: Space-seperated sharing on the hardware * Multi Instance GPU: Space-seperated sharing on the hardware
* Virtual GPU: Virtualices Time slicing or MIG * Virtual GPU: Virtualizes Time slicing or MIG
* CUDA Streams: Run multiple kernels in a single app * CUDA Streams: Run multiple kernels in a single app
## Dynamic resource allocation ## Dynamic resource allocation
* A new alpha feature since Kube 1.26 for dynamic ressource requesting * A new alpha feature since Kube 1.26 for dynamic resource requesting
* You just request a ressource via the API and have fun * You just request a resource via the API and have fun
* The sharing itself is an implementation detail * The sharing itself is an implementation detail
## GPU scale out challenges ## GPU scale-out challenges
* NVIDIA Picasso is a foundry for model creation powered by Kubernetes * NVIDIA Picasso is a foundry for model creation powered by Kubernetes
* The workload is the training workload split into batches * The workload is the training workload split into batches
* Challenge: Schedule multiple training jobs by different users that are prioritized * Challenge: Schedule multiple training jobs by different users that are prioritized
### Topology aware placments ### Topology aware placements
* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching * You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
* Target: optimize related jobs based on GPU node distance and NUMA placement * Target: optimize related jobs based on GPU node distance and NUMA placement
@ -44,11 +44,11 @@ Kevin and Sanjay from NVIDIA
* Stuff can break, resulting in slowdowns or errors * Stuff can break, resulting in slowdowns or errors
* Challenge: Detect faults and handle them * Challenge: Detect faults and handle them
* Observability both in-band and out ouf band that expose node conditions in kubernetes * Observability both in-band and out of band that expose node conditions in Kubernetes
* Needed: Automated fault-tolerant scheduling * Needed: Automated fault-tolerant scheduling
### Multi-dimensional optimization ### Multidimensional optimization
* There are different KPIs: starvation, prioprity, occupanccy, fainrness * There are different KPIs: starvation, priority, occupancy, fairness
* Challenge: What to choose (the multi-dimensional decision problemn) * Challenge: What to choose (the multidimensional decision problem)
* Needed: A scheduler that can balance the dimensions * Needed: A scheduler that can balance the dimensions

View File

@ -15,11 +15,11 @@ Jorge Palma from Microsoft with a quick introduction.
* Containerized models * Containerized models
* GPUs in the cluster (install, management) * GPUs in the cluster (install, management)
## Kubernetes AI Toolchain (KAITO) ## Kubernetes AI Tool chain (KAITO)
* Kubernetes operator that interacts with * Kubernetes operator that interacts with
* Node provisioner * Node provisioner
* Deployment * Deployment
* Simple CRD that decribes a model, infra and have fun * Simple CRD that describes a model, infra and have fun
* Creates inferance endpoint * Creates inference endpoint
* Models are currently 10 (Hugginface, LLMA, etc) * Models are currently 10 (Hugginface, LLMA, etc.)

View File

@ -6,14 +6,14 @@ tags:
- panel - panel
--- ---
A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN. A panel discussion with moderation by Google and participants from Google, Alluxio, Ampere and CERN.
It was pretty scripted with prepared (sponsor specific) slides for each question answered. It was pretty scripted with prepared (sponsor specific) slides for each question answered.
## Takeaways ## Takeaways
* Deploying a ML should become the new deploy a web app * Deploying an ML should become the new deployment a web app
* The hardware should be fully utilized -> Better ressource sharing and scheduling * The hardware should be fully utilized -> Better resource sharing and scheduling
* Smaller LLMs on cpu only is preyy cost efficient * Smaller LLMs on CPU only is pretty cost-efficient
* Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow * Better scheduling by splitting into storage + CPU (prepare) and GPU (run) nodes to create a just-in-time flow
* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs * Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
* We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads * We should be flexible regarding hardware, multi-cluster workloads and hybrid (onprem, burst to cloud) workloads

View File

@ -5,41 +5,41 @@ tags:
- keynote - keynote
--- ---
Nikhita presented projects that merge CloudNative and AI. Nikhita presented projects that merge cloud native and AI.
PAtrick Ohly Joined for DRA Patrick Ohly Joined for DRA
### The "news" ### The "news"
* New work group AI * New work group AI
* More tools are including ai features * More tools are including AI features
* New updated cncf for children feat AI * New updated CNCF for children feat AI
* One decade of Kubernetes * One decade of Kubernetes
* DRA is in alpha * DRA is in alpha
### DRA ### DRA
* A new API for resources (node-local and node-attached) * A new API for resources (node-local and node-attached)
* Sharing of ressources between cods and containers * Sharing of resources between cods and containers
* Vendor specific stuff are abstracted by a vendor driver controller * Vendor specific stuff are abstracted by a vendor driver controller
* The kube scheduler can interact with the vendor parameters for scheduling and autoscaling * The kube scheduler can interact with the vendor parameters for scheduling and autoscaling
### Cloudnative AI ecosystem ### Cloud native AI ecosystem
* Kube is the seed for the AI infra plant * Kube is the seed for the AI infra plant
* Kubeflow users wanted AI registries * Kubeflow users wanted AI registries
* LLM on the edge * LLM on the edge
* Opentelemetry bring semandtics * OpenTelemetry bring semantics
* All of these tools form a symbiosis between * All of these tools form a symbiosis between
* Topics of discussions * Topics of discussions
### The working group AI ### The working group AI
* It was formed in october 2023 * It was formed in October 2023
* They are working on the whitepaper (cloudnative and ai) wich was opublished on 19.03.2024 * They are working on the white paper (cloud native and AI) which was published on 19.03.2024
* The landscape "cloudnative and ai" is WIP and will be merged into the main CNCF landscape * The landscape "cloud native and AI" is WIP and will be merged into the main CNCF landscape
* The future focus will be on security and cost efficiency (with a hint of sustainability) * The future focus will be on security and cost efficiency (with a hint of sustainability)
### LFAI and CNCF ### LFAI and CNCF
* The direcor of the AI foundation talks abouzt ai and cloudnative * The director of the AI foundation talks about AI and cloud native
* They are looking forward to more colaboraion * They are looking forward to more collaboration

View File

@ -14,7 +14,7 @@ The entire talk was very short, but it was a nice demo of init containers
* Security is hard - distroless sounds like a nice helper * Security is hard - distroless sounds like a nice helper
* Basic Challenge: Usability-Security Dilemma -> But more usability doesn't mean less secure, but more updating * Basic Challenge: Usability-Security Dilemma -> But more usability doesn't mean less secure, but more updating
* Distro: Kernel + Software Packages + Package manager (optional) -> In Containers just without the kernel * Distro: Kernel + Software Packages + Package manager (optional) -> In Containers just without the kernel
* Distroless: No package manager, no shell, no webcluent (curl/wget) - only minimal sofware bundels * Distroless: No package manager, no shell, no web client (curl/wget) - only minimal software bundles
## Tools for distroless image creation ## Tools for distroless image creation
@ -29,13 +29,13 @@ The entire talk was very short, but it was a nice demo of init containers
## Demo ## Demo
* A (rough) distroless postgres with alpine build step and scratch final step * A (rough) distroless Postgres with alpine build step and scratch final step
* A basic pg:alpine container used for init with a shared data volume * A basic pg:alpine container used for init with a shared data volume
* The init uses the pg admin user to initialize the pg server (you don't need the admin creds after this) * The init uses the pg admin user to initialize the pg server (you don't need the admin credentials after this)
### Kube ### Kube
* K apply failed b/c no internet, but was fixed by connecting to wifi * K apply failed b/c no internet, but was fixed by connecting to Wi-Fi
* Without the init container the pod just crashes, with the init container the correct config gets created * Without the init container the pod just crashes, with the init container the correct config gets created
### Docker compose ### Docker compose

View File

@ -13,63 +13,63 @@ A talk by elastic.
## About elastic ## About elastic
* Elestic cloud as a managed service * Elastic cloud as a managed service
* Deployed across AWS/GCP/Azure in over 50 regions * Deployed across AWS/GCP/Azure in over 50 regions
* 600.000+ Containers * 600000+ Containers
### Elastic and Kube ### Elastic and Kube
* They offer elastic obervability * They offer elastic observability
* They offer the ECK operator for simplified deployments * They offer the ECK operator for simplified deployments
## The baseline ## The baseline
* Goal: A large scale (1M+ containers resilient platform on k8s * Goal: A large scale (1M+ containers) resilient platform on k8s
* Architecture * Architecture
* Global Control: The control plane (api) for users with controllers * Global Control: The control plane (API) for users with controllers
* Regional Apps: The "shitload" of kubernetes clusters where the actual customer services live * Regional Apps: The "shitload" of Kubernetes clusters where the actual customer services live
## Scalability ## Scalability
* Challenge: How large can our cluster be, how many clusters do we need * Challenge: How large can our cluster be, how many clusters do we need
* Problem: Only basic guidelines exist for that * Problem: Only basic guidelines exist for that
* Decision: Horizontaly scale the number of clusters (5ßß-1K nodes each) * Decision: Horizontally scale the number of clusters (5ßß-1K nodes each)
* Decision: Disposable clusters * Decision: Disposable clusters
* Throw away without data loss * Throw away without data loss
* Single source of throuth is not cluster etcd but external -> No etcd backups needed * Single source of truth is not cluster etcd but external -> No etcd backups needed
* Everything can be recreated any time * Everything can be recreated any time
## Controllers ## Controllers
{{% notice style="note" %}} {{% notice style="note" %}}
I won't copy the explanations of operators/controllers in this notes I won't copy the explanations of operators/controllers in these notes
{{% /notice %}} {{% /notice %}}
* Many different controllers, including (but not limited to) * Many controllers, including (but not limited to)
* cluster controler: Register cluster to controller * cluster controller: Register cluster to controller
* Project controller: Schedule user's project to cluster * Project controller: Schedule user's project to cluster
* Product controllers (Elasticsearch, Kibana, etc.) * Product controllers (Elasticsearch, Kibana, etc.)
* Ingress/Certmanager * Ingress/Cert manager
* Sometimes controllers depend on controllers -> potential complexity * Sometimes controllers depend on controllers -> potential complexity
* Pro: * Pro:
* Resilient (Selfhealing) * Resilient (Self-healing)
* Level triggered (desired state vs procedure triggered) * Level triggered (desired state vs procedure triggered)
* Simple reasoning when comparing desired state vs state machine * Simple reasoning when comparing desired state vs state machine
* Official controller runtime lib * Official controller runtime lib
* Workque: Automatic Dedup, Retry backoff and so on * Workqueue: Automatic Dedup, Retry back off and so on
## Global Controllers ## Global Controllers
* Basic operation * Basic operation
* Uses project config from Elastic cloud as the desired state * Uses project config from Elastic cloud as the desired state
* The actual state is a k9s ressource in another cluster * The actual state is a k9s resource in another cluster
* Challenge: Where is the source of thruth if the data is not stored in etc * Challenge: Where is the source of truth if the data is not stored in etcd
* Solution: External datastore (postgres) * Solution: External data store (Postgres)
* Challenge: How do we sync the db sources to kubernetes * Challenge: How do we sync the db sources to Kubernetes
* Potential solutions: Replace etcd with the external db * Potential solutions: Replace etcd with the external db
* Chosen solution: * Chosen solution:
* The controllers don't use CRDs for storage, but they expose a webapi * The controllers don't use CRDs for storage, but they expose a web-API
* Reconciliation still now interacts with the external db and go channels (que) instead * Reconciliation still now interacts with the external db and go channels (queue) instead
* Then the CRs for the operators get created by the global controller * Then the CRs for the operators get created by the global controller
### Large scale ### Large scale
@ -82,10 +82,10 @@ I won't copy the explanations of operators/controllers in this notes
### Reconcile ### Reconcile
* User-driven events are processed asap * User-driven events are processed asap
* reconcole of everything should happen, bus with low prio slowly in the background * reconcile of everything should happen, bus with low priority slowly in the background
* Solution: Status: LastReconciledRevision (timestamp) get's compare to revision, if larger -> User change * Solution: Status: LastReconciledRevision (timestamp) gets compare to revision, if larger -> User change
* Prioritization: Just a custom event handler with the normal queue and a low prio * Prioritization: Just a custom event handler with the normal queue and a low priority
* Low Prio Queue: Just a queue that adds items to the normal work-queue with a rate limit * Queue: Just a queue that adds items to the normal work-queue with a rate limit
```mermaid ```mermaid
flowchart LR flowchart LR

View File

@ -6,39 +6,39 @@ tags:
- security - security
--- ---
A talk by Google and Microsoft with the premise of bether auth in k8s. A talk by Google and Microsoft with the premise of better auth in k8s.
## Baselines ## Baselines
* Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets * Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets
* Result: CVEs * Result: CVEs
* Example: Just use ingress, nginx, put in some lua code in the config and voila: Service account token * Example: Just use ingress, nginx, put in some Lua code in the config and e voilà: Service account token
* Fix: No more fun * Fix: No more fun
## Basic solutions ## Basic solutions
* Seperate Control (the controller) from data (the ingress) * Separate Control (the controller) from data (the ingress)
* Namespace limited ingress * Namespace limited ingress
## Current state of cross namespace stuff ## Current state of cross namespace stuff
* Why: Reference tls cert for gateway api in the cert team'snamespace * Why: Reference TLS cert for gateway API in the cert team's namespace
* Why: Move all ingress configs to one namespace * Why: Move all ingress configs to one namespace
* Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret) * Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret)
* Gateway Solution: * Gateway Solution:
* Gateway TLS secret ref includes a namespace * Gateway TLS secret ref includes a namespace
* ReferenceGrant pretty mutch allows referencing from X (Gatway) to Y (Secret) * ReferenceGrant pretty much allows referencing from X (Gateway) to Y (Secret)
* Limits: * Limits:
* Has to be implemented via controllers * Has to be implemented via controllers
* The controllers still have readall - they just check if they are supposed to do this * The controllers still have read all - they just check if they are supposed to do this
## Goals ## Goals
### Global ### Global
* Grant access to controller to only ressources relevant for them (using references and maybe class segmentation) * Grant access to controller to only resources relevant for them (using references and maybe class segmentation)
* Allow for safe cross namespace references * Allow for safe cross namespace references
* Make it easy for api devs to adopt it * Make it easy for API devs to adopt it
### Personas ### Personas
@ -50,20 +50,20 @@ A talk by Google and Microsoft with the premise of bether auth in k8s.
* Alex: Define relationships via ReferencePatterns * Alex: Define relationships via ReferencePatterns
* Kai: Specify controller identity (Serviceaccount), define relationship API * Kai: Specify controller identity (Serviceaccount), define relationship API
* Rohan: Define cross namespace references (aka ressource grants that allow access to their ressources) * Rohan: Define cross namespace references (aka resource grants that allow access to their resources)
## Result of the paper ## Result of the paper
### Architecture ### Architecture
* ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API * ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API
* ReferenceConsumer: Who (IOdentity) has access under which conditions? * ReferenceConsumer: Who (Identity) has access under which conditions?
* ReferenceGrant: Allow specific references * ReferenceGrant: Allow specific references
### POC ### POC
* Minimum access: You only get access if the grant is there AND the reference actually exists * Minimum access: You only get access if the grant is there AND the reference actually exists
* Their basic implementation works with the kube api * Their basic implementation works with the kube API
### Open questions ### Open questions
@ -74,9 +74,9 @@ A talk by Google and Microsoft with the premise of bether auth in k8s.
## Alternative ## Alternative
* Idea: Just extend RBAC Roles with a selector (match labels, etc) * Idea: Just extend RBAC Roles with a selector (match labels, etc.)
* Problems: * Problems:
* Requires changes to kubernetes core auth * Requires changes to Kubernetes core auth
* Everything bus list and watch is a pain * Everything bus list and watch is a pain
* How do you handle AND vs OR selection * How do you handle AND vs OR selection
* Field selectors: They exist * Field selectors: They exist
@ -84,5 +84,5 @@ A talk by Google and Microsoft with the premise of bether auth in k8s.
## Meanwhile ## Meanwhile
* Prefer tools that support isolatiobn between controller and dataplane * Prefer tools that support isolation between controller and data-plane
* Disable all non-needed features -> Especially scripting * Disable all non-needed features -> Especially scripting

View File

@ -6,32 +6,32 @@ tags:
- dx - dx
--- ---
A talk by UX and software people at RedHat (Podman team). A talk by UX and software people at Red Hat (Podman team).
The talk mainly followed the academic study process (aka this is the survey I did for my bachelors/masters thesis). The talk mainly followed the academic study process (aka this is the survey I did for my bachelor's/master's thesis).
## Research ## Research
* User research Study including 11 devs and platform engineers over three months * User research Study including 11 devs and platform engineers over three months
* Focus was on an new podman desktop feature * Focus was on a new Podman desktop feature
* Experence range 2-3 years experience average (low no experience, high oldschool kube) * Experience range 2-3 years experience average (low no experience, high old school kube)
* 16 questions regarding environment, workflow, debugging and pain points * 16 questions regarding environment, workflow, debugging and pain points
* Analysis: Affinity mapping * Analysis: Affinity mapping
## Findings ## Findings
* Where do I start when things are broken? -> There may be solutions, but devs don't know about them * Where do I start when things are broken? -> There may be solutions, but devs don't know about them
* Network debugging is hard b/c many layers and problems occuring in between cni and infra are really hard -> Network topology issues are rare but hard * Network debugging is hard b/c many layers and problems occurring in between CNI and infra are really hard -> Network topology issues are rare but hard
* YAML identation -> Tool support is needed for visualisation * YAML indentation -> Tool support is needed for visualization
* YAML validation -> Just use validation in dev and gitops * YAML validation -> Just use validation in dev and GitOps
* YAML Cleanup -> Normalize YAML (order, anchors, etc) for easy diff * YAML Cleanup -> Normalize YAML (order, anchors, etc.) for easy diff
* Inadequate security analysis (too verbose, non-issues are warnings) -> Realtime insights (and during dev) * Inadequate security analysis (too verbose, non-issues are warnings) -> Real-time insights (and during dev)
* Crash Loop -> Identify stuck containers, simple debug containers * Crash Loop -> Identify stuck containers, simple debug containers
* CLI vs GUI -> Enable eperience level oriented gui, Enhance intime troubleshooting * CLI vs GUI -> Enable experience level oriented GUI, Enhance in-time troubleshooting
## General issues ## General issues
* No direct fs access * No direct fs access
* Multiple kubeconfigs * Multiple kubeconfigs
* SaaS is sometimes only provided on kube, which sounds like complexity * SaaS is sometimes only provided on kube, which sounds like complexity
* Where do i begin my troubleshooting * Where do I begin my troubleshooting
* Interoperability/Fragility with updates * Interoperability/Fragility with updates

View File

@ -6,11 +6,11 @@ tags:
- network - network
--- ---
Global field CTO at Solo.io with a hint of servicemesh background. Global field CTO at Solo.io with a hint of service mesh background.
## History ## History
* LinkerD 1.X was the first moder servicemesh and basicly a opt-in serviceproxy * LinkerD 1.X was the first modern service mesh and basically an opt-in service proxy
* Challenges: JVM (size), latencies, ... * Challenges: JVM (size), latencies, ...
### Why not node-proxy? ### Why not node-proxy?
@ -23,8 +23,8 @@ Global field CTO at Solo.io with a hint of servicemesh background.
### Why sidecar? ### Why sidecar?
* Transparent (ish) * Transparent (ish)
* PArt of app lifecycle (up/down) * Part of app lifecycle (up/down)
* Single tennant * Single tenant
* No noisy neighbor * No noisy neighbor
### Sidecar drawbacks ### Sidecar drawbacks
@ -46,7 +46,7 @@ Global field CTO at Solo.io with a hint of servicemesh background.
* Full transparency * Full transparency
* Optimized networking * Optimized networking
* Lower ressource allocation * Lower resource allocation
* No race conditions * No race conditions
* No manual pod injection * No manual pod injection
* No credentials in the app * No credentials in the app
@ -68,12 +68,12 @@ Global field CTO at Solo.io with a hint of servicemesh background.
* Kubeproxy replacement * Kubeproxy replacement
* Ingress (via Gateway API) * Ingress (via Gateway API)
* Mutual Authentication * Mutual Authentication
* Specialiced CiliumNetworkPolicy * Specialized CiliumNetworkPolicy
* Configure Envoy throgh Cilium * Configure Envoy through Cilium
### Control Plane ### Control Plane
* Cilium-Agent on each node that reacts to scheduled workloads by programming the local dataplane * Cilium-Agent on each node that reacts to scheduled workloads by programming the local data-plane
* API via Gateway API and CiliumNetworkPolicy * API via Gateway API and CiliumNetworkPolicy
```mermaid ```mermaid
@ -98,29 +98,29 @@ flowchart TD
### Data plane ### Data plane
* Configured by control plane * Configured by control plane
* Does all of the eBPF things in L4 * Does all the eBPF things in L4
* Does all of the envoy things in L7 * Does all the envoy things in L7
* In-Kernel Wireguard for optional transparent encryption * In-Kernel WireGuard for optional transparent encryption
### mTLS ### mTLS
* Network Policies get applied at the eBPF layer (check if id a can talk to id 2) * Network Policies get applied at the eBPF layer (check if ID a can talk to ID 2)
* When mTLS is enabled there is a auth check in advance -> It it fails, proceed with agents * When mTLS is enabled there is an auth check in advance -> If it fails, proceed with agents
* Agents talk to each other for mTLS Auth and save the result to a cache -> Now ebpf can say yes * Talk to each other for mTLS Auth and save the result to a cache -> Now eBPF can say yes
* Problems: The caches can lead to id confusion * Problems: The caches can lead to ID confusion
## Istio ## Istio
### Basiscs ### Basics
* L4/7 Service mesh without it's own CNI * L4/7 Service mesh without its own CNI
* Based on envoy * Based on envoy
* mTLS * mTLS
* Classicly via sidecar, nowadays * Classically via sidecar, nowadays
### Ambient mode ### Ambient mode
* Seperate L4 and L7 -> Can run on cilium * Separate L4 and L7 -> Can run on cilium
* mTLS * mTLS
* Gateway API * Gateway API
@ -143,14 +143,14 @@ flowchart TD
``` ```
* Central xDS Control Plane * Central xDS Control Plane
* Per-Node Dataplane that reads updates from Control Plane * Per-Node Data-plane that reads updates from Control Plane
### Data Plane ### Data Plane
* L4 runs via zTunnel Daemonset that handels mTLS * L4 runs via zTunnel Daemonset that handles mTLS
* The zTunnel traffic get's handed over to the CNI * The zTunnel traffic gets handed over to the CNI
* L7 Proxy lives somewhere™ and traffic get's routed through it as an "extra hop" aka waypoint * L7 Proxy lives somewhere™ and traffic gets routed through it as an "extra hop" aka waypoint
### mTLS ### mTLS
* The zTunnel creates a HBONE (http overlay network) tunnel with mTLS * The zTunnel creates a HBONE (HTTP overlay network) tunnel with mTLS

View File

@ -8,17 +8,17 @@ Who have I talked to today, are there any follow-ups or learnings?
## Operator Framework ## Operator Framework
* We talked about the operator lifecycle manager * We talked about the operator lifecycle manager
* They shared the roadmap and the new release 1.0 will bring support for Operator Bundle loading from any oci source (no more public-registry enforcement) * They shared the roadmap and the new release 1.0 will bring support for Operator Bundle loading from any OCI source (no more public-registry enforcement)
## Flux ## Flux
* We talked about automatic helm release updates [lessons learned from flux](/lessons_learned/02_flux) * We talked about automatic helm release updates [lessons learned from flux](/lessons_learned/02_flux)
## Cloudfoundry/Paketo ## Cloud foundry/Paketo
* We mostly had some smalltalk * We mostly had some smalltalk
* There will be a cloudfoundry day in Karlsruhe in October, they'd be happy to have us ther * There will be a cloud foundry day in Karlsruhe in October, they'd be happy to have us there
* The whole KORFI (Cloudfoundry on Kubernetes) Project is still going strong, but no release canidate yet (or in the near future) * The whole KORFI (Cloud foundry on Kubernetes) Project is still going strong, but no release candidate yet (or in the near future)
## Traefik ## Traefik
@ -31,7 +31,7 @@ They will follow up
## Postman ## Postman
* I asked them about their new cloud-only stuff: They will keep their direction * I asked them about their new cloud-only stuff: They will keep their direction
* The are also planning to work on info materials on why postman SaaS is not a big security risk * They are also planning to work on info materials on why postman SaaS is not a big security risk
## Mattermost ## Mattermost
@ -39,9 +39,9 @@ They will follow up
I should follow up I should follow up
{{% /notice %}} {{% /notice %}}
* I talked about our problems with the mattermost operator and was asked to get back to them with the errors * I talked about our problems with the Mattermost operator and was asked to get back to them with the errors
* They're currently migrating the mattermost cloud offering to arm - therefor arm support will be coming in the next months * They're currently migrating the Mattermost cloud offering to arm - therefor arm support will be coming in the next months
* The mattermost guy had exactly the same problems with notifications and read/unread using element * The Mattermost guy had exactly the same problems with notifications and read/unread using element
## Vercel ## Vercel
@ -53,7 +53,7 @@ I should follow up
* The paid renovate offering now includes build failure estimation * The paid renovate offering now includes build failure estimation
* I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification * I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification
### Certmanager ### Cert manager
* The best swag (judged by coolness points) * The best swag (judged by coolness points)
@ -63,11 +63,11 @@ I should follow up
They will follow up with a quick demo They will follow up with a quick demo
{{% /notice %}} {{% /notice %}}
* A kubernetes security/runtime security solution with pretty nice looking urgency filters * A Kubernetes security/runtime security solution with pretty nice looking urgency filters
* Includes eBPF to see what code actually runs * Includes eBPF to see what code actually runs
* I'll witness a demo in early/mid april * I'll witness a demo in early/mid April
### Isovalent ### Isovalent
* Dinner (very tasty) * Dinner (very tasty)
* Cilium still sounds like the way to go in regards to CNIs * Cilium still sounds like the way to go in regard to CNIs

View File

@ -5,7 +5,7 @@ weight: 2
--- ---
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon). Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
This is where all of the people joined (over 2000) This is where all the people joined (over 12000)
The opening keynotes were a mix of talks and panel discussions. The opening keynotes were a mix of talks and panel discussions.
The main topic was - who could have guessed - AI and ML. The main topic was - who could have guessed - AI and ML.

View File

@ -11,8 +11,8 @@ A talk by Google and Ivanti.
## Background ## Background
* RBAC is ther to limit information access and control * RBAC is there to limit information access and control
* RBAC can be used to avoid interfearance in shared envs * RBAC can be used to avoid interference in shared envs
* DNS is not really applicable when it comes to RBAC * DNS is not really applicable when it comes to RBAC
### DNS in Kubernetes ### DNS in Kubernetes
@ -26,11 +26,11 @@ A talk by Google and Ivanti.
* Specially for smaller, high growth companies with infinite VC money * Specially for smaller, high growth companies with infinite VC money
* Just give everyone their own cluster -> Problem solved * Just give everyone their own cluster -> Problem solved
* Smaller (<1000) typicly use many small clusters * Smaller (<1000) typically use many small clusters
### Shared Clusters ### Shared Clusters
* Becomes imporetant when cost is a question and engineers don't have any platform knowledge * Becomes important when cost is a question and engineers don't have any platform knowledge
* A dedicated kube team can optimize both hardware and deliver updates fast -> Increased productivity by utilizing specialists * A dedicated kube team can optimize both hardware and deliver updates fast -> Increased productivity by utilizing specialists
* Problem: Noisy neighbors by leaky DNS * Problem: Noisy neighbors by leaky DNS
@ -45,14 +45,14 @@ A talk by Google and Ivanti.
### Leak mechanics ### Leak mechanics
* Leaks are based on the `<service>.<nemspace>.<svc>.cluster.local` pattern * Leaks are based on the `<service>.<nemspace>.<svc>.cluster.local` pattern
* You can also just reverse looku the entire service CIDR * You can also just reverse lookup the entire service CIDR
* SRV records get created for each service including the service ports * SRV records get created for each service including the service ports
## Fix the leak ## Fix the leak
### CoreDNS Firewall Plugin ### CoreDNS Firewall Plugin
* External plugin provided by the coredns team * External plugin provided by the CoreDNS team
* Expression engine built-in with support for external policy engines * Expression engine built-in with support for external policy engines
```mermaid ```mermaid
@ -67,19 +67,19 @@ flowchart LR
### Demo ### Demo
* Firwall rule that only allows queries from the same namespace, kube-system or default * Firewall rule that only allows queries from the same namespace, `kube-system` or `default`
* Every other cross-namespace request gets blocked * Every other cross-namespace request gets blocked
* Same SVC requests from before now return NXDOMAIN * Same SVC requests from before now return `NXDOMAIN`
### Why is this a plugin and not default? ### Why is this a plugin and not default?
* Requires `pods verified` mode -> Puts the watch on pods and only returns a query result if the pod actually exists * Requires `pods verified` mode -> Puts the watch on pods and only returns a query result if the pod actually exists
* Puts a watch on all pods -> higher API load and coredns mem usage * Puts a watch on all pods -> higher API load and CoreDNS memory usage
* Potential race conditions with initial lookups in larger clusters -> Alternative is to fail open (not really secure) * Potential race conditions with initial lookups in larger clusters -> Alternative is to fail open (not really secure)
### Per tenant DNS ### Per tenant DNS
* Just run a cporedns instance for each tenant * Just run a CoreDNS instance for each tenant
* Use a mutating webhook to inject the right dns into each pod * Use a mutating webhook to inject the right DNS into each pod
* Pro: No more pods verified -> Aka no more constant watch * Pro: No more pods verified -> Aka no more constant watch
* Limitation: Platform services still need a central coredns * Limitation: Platform services still need a central CoreDNS

View File

@ -6,7 +6,7 @@ tags:
- dx - dx
--- ---
Mitch from aviatrix -a former software engineer who has now switched over to product managment. Mitch from aviatrix -a former software engineer who has now switched over to product management.
## Opening Thesis ## Opening Thesis
@ -14,19 +14,19 @@ Opening with the Atari 2600 E.T. game as very bad fit sample.
Thesis: Missing user empathy Thesis: Missing user empathy
* A very hard game aimed at children without the will to trail and error * A very hard game aimed at children without the will to trail and error
* Other aspect: Some of the devalopers were pulled together from throughout the company -> No passion needed * Other aspect: Some devalopers were pulled together from throughout the company -> No passion needed
### Another sample ### Another sample
* Idea: SCADA system with sensors that can be moved and the current location get's tracked via iPad. * Idea: SCADA system with sensors that can be moved, and the current location gets tracked via iPad.
* Result: Nobody used the iPad app - only the desktop webapp * Result: Nobody used the iPad app - only the desktop Web-app
* Problem: Sensor get's moved, location not updated, the measurements for the wrong location get reported until update * Problem: Sensor gets moved, location not updated, the measurements for the wrong location get reported until update
* Source: Moving a sensor is a pretty involved process including high pressure aka no priority for iPad * Source: Moving a sensor is a pretty involved process including high pressure aka no priority for iPad
* Empathy loss: Different working endvironments result in drastic work experience missmatch * Empathy loss: Different working environments result in drastic work experience mismatch
## The source ## The source
* Idea: A software engineer writes software, that someone else has to use, not themselfes * Idea: A software engineer writes software, that someone else has to use, not themselves
* Problem: Distance between user and dev is high and their perspectives differ heavily * Problem: Distance between user and dev is high and their perspectives differ heavily
## User empathy ## User empathy
@ -37,43 +37,43 @@ Thesis: Missing user empathy
## Stories from Istio ## Stories from Istio
* Classic implementation: Sidecar Proxy * Classic implementation: Sidecar Proxy
* Question: Can the same value be provided without a sidecar anywhers * Question: Can the same value be provided without a sidecar anywhere
* Answer: Ambient mode -> split into l4 (proxy per node) and l7 (no sharing) * Answer: Ambient mode -> split into l4 (proxy per node) and l7 (no sharing)
* Problem: After alpha release ther was a lack of exitement and feedback * Problem: After alpha release there was a lack of excitement and feedback
* Result: Twitter Space event for feedback * Result: Twitter Space event for feedback
### Ideas and feedback ### Ideas and feedback
* Idea: Sidecar is somewhat magical * Idea: Sidecar is somewhat magical
* Feedback: Sidecars are a pain, but after integrating istio can be automated -> a problem gets solved, that already had a solution * Feedback: Sidecars are a pain, but after integrating Istio can be automated -> a problem gets solved, that already had a solution
* Result: Highly overvalued the pain of sidecars * Result: Highly overvalued the pain of sidecars
* Idea: Building istio into a platform sounds easy * Idea: Building Istio into a platform sounds easy
* Feedback: The platform has to be changed for the new ambient mode -> High time investment while engineers are hard * Feedback: The platform has to be changed for the new ambient mode -> High time investment while engineers are hard
* Result: The cost of platform changes was highly undervalued * Result: The cost of platform changes was highly undervalued
* Idea: Sidecar compute sound expensive and networking itself pretty cheap * Idea: Sidecar compute sound expensive and networking itself pretty cheap
* Feedback: Many users have multi-region clusters -> Egress is whery expoenive * Feedback: Many users have multi-region clusters -> Egress is very expensive
* Result: The relation between compute and egress cost was pretty much swapped * Result: The relation between compute and egress cost was pretty much swapped
### What now? ### What now?
* Ambient is the right solution for new users (fresh service mesehes) * Ambient is the right solution for new users (fresh service meshes)
* Existing users probaly won't upgrade * Existing users probably won't upgrade
* Result: They will move forward with ambient mdoe * Result: They will move forward with ambient mode
## So what did we lern ## So what did we learn
### Basic questions ### Basic questions
* Who are my intended users? * Who are my intended users?
* What exites/worries them? * What excites/worries them?
* What do they find easy/hard? * What do they find easy/hard?
* What is ther biggest expense and what is inexpensive? * What is the biggest expense and what is inexpensive?
### How to get better empathy ### How to get better empathy
1. Shared perspective comes from proximity 1. Shared perspective comes from proximity
1. Where they are 1. Where they are
2. What they do -> Dogfood everything related to the platform (not just your own products) 2. What they do -> Dog food everything related to the platform (not just your own products)
2. Never stop listening 2. Never stop listening
1. Even if you love your product 1. Even if you love your product
2. Especially if you love your product 2. Especially if you love your product
@ -81,4 +81,4 @@ Thesis: Missing user empathy
### Takeaways ### Takeaways
* Don't ship a puzzlebox (landscape) but a picture (this integrates with this and that) * Don't ship a puzzle box (landscape) but a picture (this integrates with this and that)

View File

@ -6,25 +6,25 @@ tags:
- business - business
--- ---
Bob a Program Manager at Google and Kubernetes steering commitee member with a bunch of contributor and maintainer experience. Bob a Program Manager at Google and Kubernetes steering committee member with a bunch of contributor and maintainer experience.
The value should be rated even higher than the pure business value. The value should be rated even higher than the pure business value.
## Baseline ## Baseline
* A öarge chunk of CNCF contrinbutors and maintainers (95%) are company affiliated * A large chunk of CNCF contributors and maintainers (95%) are company affiliated
* Most (50%) of the people contributed in professional an personal time )(and 30 only on work time) * Most (50%) of the people contributed in professional personal time (and 30 only on work time)
* Explaining business value can be very complex * Explaining business value can be very complex
* Base question: What does this contribute to the business * Base question: What does this contribute to the business
## Data enablement ## Data enablement
* Problem: Insufficient data (data collection is often an afterthought) * Problem: Insufficient data (data collection is often an afterthought)
* Example used: Random CNCF slection * Example used: Random CNCF selection
* 50% of issues are labed consistentöy * 50% of issues are labeled consistently
* 17% of projects label PRs * 17% of projects label PRs
* 58% of projects use milestones * 58% of projects use milestones
* Labels provide: Context, Prioritization, Scope, State * Labels provide: Context, Prioritization, Scope, State
* Milestones enable: Filtering outside of daterange * Milestones enable: Filtering outside date range
* Sample queries: * Sample queries:
* How many features have been in milestone XY? * How many features have been in milestone XY?
* How many bugs have been fixed in this version? * How many bugs have been fixed in this version?
@ -37,36 +37,36 @@ The value should be rated even higher than the pure business value.
* Thought of as overhead * Thought of as overhead
* Project is too small * Project is too small
* Tools: * Tools:
* Actions/Pipelines for autolabel, copy label sync labels * Actions/Pipelines for auto-label, copy label sync labels
* Prow: The label system for kubernetes projects * Prow: The label system for Kubernetes projects
* People with high project but low code knowlege can triage -> Make them feel recognized * People with high project, but low code knowledge can triage -> Make them feel recognized
### Conclusions ### Conclusions
* Consistent labels & milestones are critical for state analysis * Consistent labels & milestones are critical for state analysis
* Data is the evidence needed in messaging for leadershiü * Data is the evidence needed in messaging for leadership
* Recruting triage-specific people and using automations streamlines the process * Recruiting triage-specific people and using automations streamlines the process
## Communication ## Communication
### Personas ### Personas
* OSS enthusiast: Knows the ecosystem and project with a knack for discussions and deep dives * OSS enthusiast: Knows the ecosystem and project with a knack for discussions and deep dives
* Maintainer;: A enthusiast that is tired, unter pressure and most of the time a one-man show that would prefer doint thechnical stuff * Maintainer;: A enthusiast that is tired, under pressure and most of the time a one-man show that would prefer doing technical stuff
* CXO: Focus on ressources, health, ROI * CXO: Focus on resources, health, ROI
* Product manager: Get the best project, user friendly * Product manager: Get the best project, user-friendly
* Leads: Employees should meet KPIs, with slightly better techn understanding * Leads: Employees should meet KPIs, with slightly better tech understanding
* End user: How can tools/features help me * End user: How can tools/features help me
### Growth limits ### Growth limits
* Main questions: * Main questions:
* What is theis project/feature * What is this project/feature
* Where is the roadmap * Where is the roadmap
* What parts of the project are at risk? * What parts of the project are at risk?
* Problem: Wording * Problem: Wording
### Ways of surfcing information ### Ways of surfacing information
* Regular project reports/blog posts * Regular project reports/blog posts
* Roadmap on website * Roadmap on website
@ -76,8 +76,8 @@ The value should be rated even higher than the pure business value.
* What are we getting out? (How fast are bugs getting fixed) * What are we getting out? (How fast are bugs getting fixed)
* What is the criticality of the project? * What is the criticality of the project?
* How much time is spent on maintainance? * How much time is spent on maintenance?
## Conclusion ## Conclusion
* Ther is significant unrealized valze in open source * There is significant unrealized value in open source

View File

@ -10,7 +10,7 @@ A talk about the backstage documentation audit and what makes a good documentati
## Opening ## Opening
* 2012 the year of the mayan calendar and the mainstream success of memes * 2012 the year of the Maya calendar and the mainstream success of memes
* The classic meme RTFM -> Classic manuals were pretty long * The classic meme RTFM -> Classic manuals were pretty long
* 2024: Manuals have become documentation (hopefully with better contents) * 2024: Manuals have become documentation (hopefully with better contents)
@ -18,9 +18,9 @@ A talk about the backstage documentation audit and what makes a good documentati
### What is documentation ### What is documentation
* Docs (the raw descriptions, qucikstart and how-to) * Docs (the raw descriptions, quick-start and how-to)
* Website (the first impression - what does this do, why would i need it) * Website (the first impression - what does this do, why would I need it)
* REAMDE (the github way of website + docs) * README (the GitHub way of website + docs)
* CONTRIBUTING (Is this a one-man show) * CONTRIBUTING (Is this a one-man show)
* Issues * Issues
* Meta docs (how do we orchestrate things) * Meta docs (how do we orchestrate things)
@ -30,10 +30,10 @@ A talk about the backstage documentation audit and what makes a good documentati
* Who needs this documentation? * Who needs this documentation?
* New users -> Optimize for minimum context * New users -> Optimize for minimum context
* Experienced users * Experienced users
* User roles (Admins, end users, ...) -> Seperate into different pages (Get started based in your role) * User roles (Admins, end users, ...) -> Separate into different pages (Get started based in your role)
* What do we need to enable with this documentation? * What do we need to enable with this documentation?
* Prove value fast -> Why this project? * Prove value fast -> Why this project?
* Educate on fundemental aspects * Educate on fundamental aspects
* Showcase features/uses cases * Showcase features/uses cases
* Hands-on enablement -> Tutorials, guides, step-by-step * Hands-on enablement -> Tutorials, guides, step-by-step
@ -43,24 +43,24 @@ A talk about the backstage documentation audit and what makes a good documentati
* Documented scheduled contributor meetings * Documented scheduled contributor meetings
* Getting started guides for new contributors * Getting started guides for new contributors
* Project governance * Project governance
* Who is gonna own it? * Who is going to own it?
* What will happen to my PR? * What will happen to my PR?
* Who maintains features? * Who maintains features?
### Website ### Website
* Single source for all pages (one repo that includes landing, docs, api and so on) -> Easier to contribute * Single source for all pages (one repo that includes landing, docs, API and so on) -> Easier to contribute
* Usability (especially on mobile) * Usability (especially on mobile)
* Social proof and case studies -> Develop trust * Social proof and case studies -> Develop trust
* SEO (actually get found) and analytics (detect how documentation is used and where people leave) * SEO (actually get found) and analytics (detect how documentation is used and where people leave)
* Plan website maintenance * Plan website maintenance
### What is great documetnation ### What is great documentation
* Project docs helps users according to their needs -> Low question to answer latency * Project docs help users according to their needs -> Low question to answer latency
* Contributor docs enables contributions in a predictable manner -> Don't leave "when will this be reviewed/mered" questions open * Contributor docs enables contributions predictably -> Don't leave "when will this be reviewed/merged" questions open
* Website proves why anyone should invest time in this projects? * Website proves why anyone should invest time in these projects?
* All documetnation is connected and up to date * All documentation is connected and up to date
## General best practices ## General best practices
@ -72,11 +72,11 @@ A talk about the backstage documentation audit and what makes a good documentati
## Examples ## Examples
* Opentelemetry: Split by role (dev, ops) * OpenTelemetry: Split by role (dev, ops)
* Prometheus: * Prometheus:
* New user conent in intro (concept) and getting started (practice) * New user content in intro (concept) and getting started (practice)
* Hierarchie includes concepts, key features and guides/tutorials * Hierarchies includes concepts, key features and guides/tutorials
## Q&A ## Q&A
* Every last wednesday in the month is a cncf echnical writers meetin (cncf slack -> techdocs) * Every last Wednesday in the month is a CNCF technical writers meeting (CNCF slack -> `#techdocs`)

View File

@ -9,11 +9,11 @@ tags:
A talk by Broadcom and Bloomberg (both related to buildpacks.io). A talk by Broadcom and Bloomberg (both related to buildpacks.io).
And a very full talk at that. And a very full talk at that.
## Baselinbe ## Baseline
* CN Buildpack provides the spec for buildpacks with a couple of different implementations * CN Buildpack provides the spec for buildpacks with a couple of different implementations
* Pack CLI with builder (collection of buildopacks - for example ppaketo or heroku) * Pack CLI with builder (collection of Buildpacks - for example Paketo or Heroku)
* Output images follow oci -> Just run them on docker/podman/kubernetes * Output images follow OCI -> Just run them on docker/Podman/Kubernetes
* Built images are `production application images` (small attack surface, SBOM, non-root, reproducible) * Built images are `production application images` (small attack surface, SBOM, non-root, reproducible)
## Scaling ## Scaling
@ -47,7 +47,7 @@ flowchart LR
* Goal: Just a simple docker full that auto-detects the right architecture * Goal: Just a simple docker full that auto-detects the right architecture
* Needed: Pack, Lifecycle, Buildpacks, Build images, builders, registry * Needed: Pack, Lifecycle, Buildpacks, Build images, builders, registry
* Current state: There is an RFC to handle image index creation with changes to buildpack creation * Current state: There is an RFC to handle image index creation with changes to Buildpack creation
* New folder structure for binaries * New folder structure for binaries
* Update config files to include targets * Update config files to include targets
* The user impact is minimal, because the builder abstracts everything away * The user impact is minimal, because the builder abstracts everything away
@ -56,5 +56,5 @@ flowchart LR
* kpack is slsa.dev v3 compliant (party hard) * kpack is slsa.dev v3 compliant (party hard)
* 5 years of production * 5 years of production
* scaling up to tanzu/heroku/gcp levels * scaling up to Tanzu/Heroku/GCP levels
* Multiarch is being worked on * Multiarch is being worked on

View File

@ -4,4 +4,4 @@ title: Day 3
weight: 3 weight: 3
--- ---
Spent most of the early day with headache therefor talk notes only start at noon. Spent most of the early day with headache therefore talk notes only start at noon.

View File

@ -9,11 +9,11 @@ tags:
## Problems ## Problems
* Dockerfiles are hard and not 100% reproducible * Dockerfiles are hard and not 100% reproducible
* Buildpoacks are reproducible but result in large single-arch images * Buildpacks are reproducible but result in large single-arch images
* Nix has multiple ways of doing things * Nix has multiple ways of doing things
## Solutions ## Solutions
* Degger as a CI solution * Dagger as a CI solution
* Multistage docker images with distroless -> Small image, small attack surcface * Multistage docker images with distroless -> Small image, small attack surface
* Language specific solutions (ki, jib) * Language specific solutions (`ki`, `jib`)

View File

@ -5,12 +5,12 @@ tags:
- ebpf - ebpf
--- ---
A talk by isovalent with a full room (one of the large ones). A talk by Isovalent with a full room (one of the large ones).
## Baseline ## Baseline
* eBPF lets you run custom code in the kernel -> close to hardware * eBPF lets you run custom code in the kernel -> close to hardware
* Typical usecases: Networking, Observability, Tracing/Profiling, security * Typical use cases: Networking, Observability, Tracing/Profiling, security
* Question: Is eBPF truing complete and can it be used for more complex scenarios (TLS, LK7)? * Question: Is eBPF truing complete and can it be used for more complex scenarios (TLS, LK7)?
## eBPF verifier ## eBPF verifier
@ -19,9 +19,9 @@ A talk by isovalent with a full room (one of the large ones).
* Principles * Principles
* Read memory only with correct permissions * Read memory only with correct permissions
* All writes to valid and safe memory * All writes to valid and safe memory
* Valid in-bounds and well formed control flow * Valid in-bounds and well-formed control flow
* Execution on-cpu time is bounded: sleep, scheduled callbacks, interations, program acutally compketes * Execution on CPU time is bounded: sleep, scheduled callbacks, iterations, program actually completes
* Aquire/release and reference count semantics * Acquire/release and reference count semantics
## Demo: Game of life ## Demo: Game of life
@ -34,7 +34,7 @@ A talk by isovalent with a full room (one of the large ones).
* Instruction limit to let the verifier actually verify the program in reasonable time * Instruction limit to let the verifier actually verify the program in reasonable time
* Limit is based on: Instruction limit and verifier step limit * Limit is based on: Instruction limit and verifier step limit
* nowadays the limit it 4096 unprivileged calls and 1 million privileged istructions * nowadays the limit it 4096 unprivileged calls and 1 million privileged instructions
* Only jump forward -> No loops * Only jump forward -> No loops
* Is a basic limitation to ensure no infinite loops can ruin the day * Is a basic limitation to ensure no infinite loops can ruin the day
* Limitation: Only finite iterations can be performed * Limitation: Only finite iterations can be performed
@ -43,14 +43,14 @@ A talk by isovalent with a full room (one of the large ones).
* Solution: subprogram (aka function) and the limit is only for each function -> `x*subprogramms = x*limit` * Solution: subprogram (aka function) and the limit is only for each function -> `x*subprogramms = x*limit`
* Limit: Needs real skill * Limit: Needs real skill
* Programs have to terminate * Programs have to terminate
* Well eBPF really only wants to release the cpu, the program doesn't have to end per se * Well eBPF really only wants to release the CPU, the program doesn't have to end per se
* Iterator: walk abitrary lists of objects * Iterator: walk arbitrary lists of objects
* Sleep on pagefault or other memory operations * Sleep on page fault or other memory operations
* Timer callbacks (including the timer 0 for run me asap) * Timer callbacks (including the timer 0 for run me asap)
* Memory allocation * Memory allocation
* Maps are used as the memory management system * Maps are used as the memory management system
## Result ## Result
* You can execure abitrary tasks via eBPF * You can execute arbitrary tasks via eBPF
* It can be used for HTTP or TLS - it's just not implemented yet™ * It can be used for HTTP or TLS - it's just not implemented yet™

View File

@ -7,20 +7,20 @@ tags:
- scaling - scaling
--- ---
By the nice opertor framework guys at IBM and RedHat. By the nice operator framework guys at IBM and Red Hat.
I'll skip the baseline introduction of what an operator is. I'll skip the baseline introduction of what an operator is.
## Operator DSK ## Operator DSK
> Build the operator > Build the operator
* Kubebuilder with v4 Plugines -> Supports the latest Kubernetes * Kubebuilder with v4 Plugins -> Supports the latest Kubernetes
* Java Operator SDK is not a part of Operator SDK and they released 5.0.0 * Java Operator SDK is not a part of Operator SDK, and they released 5.0.0
* Now with server side apply in the background * Now with server side apply in the background
* Better status updates and finalizer handling * Better status updates and finalizer handling
* Dependent ressource handling (alongside optional dependent ressources) * Dependent resource handling (alongside optional dependent resources)
## Operator Liefecycle Manager ## Operator Lifecycle Manager
> Manage the operator -> A operator for installing operators > Manage the operator -> A operator for installing operators
@ -28,16 +28,16 @@ I'll skip the baseline introduction of what an operator is.
* New API Set -> The old CRDs were overwhelming * New API Set -> The old CRDs were overwhelming
* More GitOps friendly with per-tenant support * More GitOps friendly with per-tenant support
* Prediscribes update paths (maybe upgrade) * Prescribes update paths (maybe upgrade)
* Suport for operator bundels as k8s manifests/helmchart * Support for operator bundles as k8s manifests/helm chart
### OLM v1 Components ### OLM v1 Components
* Cluster Extension (User-Facing API) * Cluster Extension (User-Facing API)
* Defines the app you want to install * Defines the app you want to install
* Resolvs requirements through catalogd/depply * Resolves requirements through CatalogD/depply
* Catalogd (Catalog Server/Operator) * CatalogD (Catalog Server/Operator)
* Depply (Dependency/Contraint solver) * Depply (Dependency/Constraint solver)
* Applier (Rukoak/kapp compatible) * Applier (Rukoak/kapp compatible)
```mermaid ```mermaid

View File

@ -7,20 +7,20 @@ tags:
- security - security
--- ---
A talk by the certmanager maintainers that also staffed the certmanager booth. A talk by the cert manager maintainers that also staffed the cert manager booth.
Humor is present, but the main focus is still thetechnical integration Humor is present, but the main focus is still the technical integration
## Baseline ## Baseline
* Certmanager is the best™ way of getting certificats * Cert manager is the best™ way of getting certificates
* Poster features: Autorenewal, ACME, PKI, HC Vault * Poster features: Auto-renewal, ACME, PKI, HC Vault
* Numbers: 20M downloads 427 contributors 11.3 GitHub stars * Numbers: 20M downloads 427 contributors 11.3 GitHub stars
* Currently on the gratuation path * Currently on the graduation path
## History ## History
* 2016: Jetstack created kube-lego -> A operator that generated LE certificates for ingress based on annotations * 2016: Jetstack created kube-lego -> A operator that generated LE certificates for ingress based on annotations
* 2o17: Certmanager launch -> Cert ressources and issuer ressources * 2o17: Cert manager launch -> Cert resources and issuer resources
* 2020: v1.0.0 and joined CNCF sandbox * 2020: v1.0.0 and joined CNCF sandbox
* 2022: CNCF incubating * 2022: CNCF incubating
* 2024: Passed the CNCF security audit and on the way to graduation * 2024: Passed the CNCF security audit and on the way to graduation
@ -30,17 +30,17 @@ Humor is present, but the main focus is still thetechnical integration
### How it came to be ### How it came to be
* The idea: Mix the digital certificate with the classical seal * The idea: Mix the digital certificate with the classical seal
* Started as the stamping idea to celebrate v1 and send contributors a thank you with candels * Started as the stamping idea to celebrate v1 and send contributors a thank you with candles
* Problems: Candels are not allowed -> Therefor glue gun * Problems: Candles are not allowed -> Therefor glue gun
### How it works ### How it works
* Components * Components
* RASPI with k3s * Raspberry Pi with k3s
* Printer * Printer
* Certmanager * Cert manager
* A go-based webui * A Go-based Web-UI
* QR-Code: Contains link to certificate with privatekey * QR-Code: Contains link to certificate with private key
```mermaid ```mermaid
flowchart LR flowchart LR
@ -53,14 +53,14 @@ flowchart LR
### What is new this year ### What is new this year
* Idea: Certs should be usable for TLS * Idea: Certs should be usable for TLS
* Solution: The QR-Code links to a zip-download with the cert and provate key * Solution: The QR-Code links to a zip-download with the cert and private key
* New: ECDSA for everything * New: ECDSA for everything
* New: A stable root ca with intermediate for every conference * New: A stable root ca with intermediate for every conference
* New: Guestbook that can only be signed with a booth issued certificate -> Available via script * New: Guestbook that can only be signed with a booth issued certificate -> Available via script
## Learnings ## Learnings
* This demo is just a private CA with certmanager -> Can be applied to any PKI-usecase * This demo is just a private CA with cert manager -> Can be applied to any PKI-usecases
* The certificate can be created via the CR, CSI driver (create secret and mount in container), ingress annotations, ... * The certificate can be created via the CR, CSI driver (create secret and mount in container), ingress annotations, ...
* You can use multiple different Issuers (CA Issuer aka PKI, Let's Encrypt, Vault, AWS, ...) * You can use multiple different Issuers (CA Issuer aka PKI, Let's Encrypt, Vault, AWS, ...)
@ -74,4 +74,4 @@ flowchart LR
## Conclusion ## Conclusion
* This is not just a demo -> Just apply it for machines * This is not just a demo -> Just apply it for machines
* They have regular meetings (daily standups and bi-weekly) * They have regular meetings (daily stand-ups and bi-weekly)

View File

@ -7,14 +7,14 @@ tags:
- scaling - scaling
--- ---
A talk by TikTok/ByteDace (duh) focussed on using central controllers instead of on the edge. A talk by TikTok/ByteDance (duh) focussed on using central controllers instead of on the edge.
## Background ## Background
> Global means non-china > Global means non-china
* Edge platform team for cdn, livestreaming, uploads, realtime communication, etc. * Edge platform team for CDN, livestreaming, uploads, real-time communication, etc.
* Around 250 cluster with 10-600 nodes each - mostly non-cloud aka baremetal * Around 250 cluster with 10-600 nodes each - mostly non-cloud aka bare-metal
* Architecture: Control plane clusters (platform services) - data plane clusters (workload by other teams) * Architecture: Control plane clusters (platform services) - data plane clusters (workload by other teams)
* Platform includes logs, metrics, configs, secrets, ... * Platform includes logs, metrics, configs, secrets, ...
@ -24,28 +24,28 @@ A talk by TikTok/ByteDace (duh) focussed on using central controllers instead of
* Operators are essential for platform features * Operators are essential for platform features
* As the feature requests increase, more operators are needed * As the feature requests increase, more operators are needed
* The deployment of operators throughout many clusters is complex (namespace, deployments, pollicies, ...) * The deployment of operators throughout many clusters is complex (namespace, deployments, policies, ...)
### Edge ### Edge
* Limited ressources * Limited resources
* Cost implication of platfor features * Cost implication of platform features
* Real time processing demands by platform features * Real time processing demands by platform features
* Balancing act between ressorces used by workload vs platform features (20-25%) * Balancing act between resources used by workload vs platform features (20-25%)
### The classic flow ### The classic flow
1. New feature get's requested 1. New feature gets requested
2. Use kube-buiders with the sdk to create the operator 2. Use kubebuider with the SDK to create the operator
3. Create namespaces and configs in all clusters 3. Create namespaces and configs in all clusters
4. Deploy operator to all clsuters 4. Deploy operator to all clusters
## Possible Solution ## Possible Solution
### Centralized Control Plane ### Centralized Control Plane
* Problem: The controller implementation is limited to a cluster boundry * Problem: The controller implementation is limited to a cluster boundary
* Idea: Why not create a signle operator that can manage multiple edge clusters * Idea: Why not create a single operator that can manage multiple edge clusters
* Implementation: Just modify kubebuilder to accept multiple clients (and caches) * Implementation: Just modify kubebuilder to accept multiple clients (and caches)
* Result: It works -> Simpler deployment and troubleshooting * Result: It works -> Simpler deployment and troubleshooting
* Concerns: High code complexity -> Long familiarization * Concerns: High code complexity -> Long familiarization
@ -54,14 +54,14 @@ A talk by TikTok/ByteDace (duh) focussed on using central controllers instead of
### Attempt it a bit more like kubebuilder ### Attempt it a bit more like kubebuilder
* Each cluster has its own manager * Each cluster has its own manager
* There is a central multimanager that starts all of the cluster specific manager * There is a central multimanager that starts all the cluster specific manager
* Controller registration to the manager now handles cluster names * Controller registration to the manager now handles cluster names
* The reconciler knows which cluster it is working on * The reconciler knows which cluster it is working on
* The multi cluster management basicly just tets all of the cluster secrets and create a manager+controller for each cluster secret * The multi cluster management basically just test all the cluster secrets and create a manager+controller for each cluster secret
* Challenges: Network connectifiy * Challenges: Network connectivity
* Solutions: * Solutions:
* Dynamic add/remove of clusters with go channels to prevent pod restarts * Dynamic add/remove of clusters with go channels to prevent pod restarts
* Connectivity health checks -> For loss the recreate manager get's triggered * Connectivity health checks -> For loss the `recreate manager` gets triggered
```mermaid ```mermaid
flowchart TD flowchart TD
@ -80,7 +80,7 @@ flowchart LR
## Conclusion ## Conclusion
* Acknowlege ressource contrains on edge * Acknowledge resource constraints on edge
* Embrace open source adoption instead of build your own * Embrace open source adoption instead of build your own
* Simplify deployment * Simplify deployment
* Recognize your own optionated approach and it's use cases * Recognize your own opinionated approach and it's use cases

View File

@ -15,22 +15,22 @@ Notes may be a bit unstructured due to tired note taker.
## Basics ## Basics
* Fluentbit is compatible with * FluentBit is compatible with
* prometheus (It can replace the prometheus scraper and node exporter) * Prometheus (It can replace the Prometheus scraper and node exporter)
* openmetrics * OpenMetrics
* opentelemetry (HTTPS input/output) * OpenTelemetry (HTTPS input/output)
* FluentBit can export to Prometheus, Splunk, InfluxDB or others * FluentBit can export to Prometheus, Splunk, InfluxDB or others
* So pretty much it can be used to collect data from a bunch of sources and pipe it out to different backend destinations * So pretty much it can be used to collect data from a bunch of sources and pipe it out to different backend destinations
* Fluent ecosystem: No vendor lock-in to observability * Fluent ecosystem: No vendor lock-in to observability
### Arhitectures ### Architectures
* The fluent agent collects data and can send it to one or multiple locations * The fluent agent collects data and can send it to one or multiple locations
* FluentBit can be used for aggregation from other sources * FluentBit can be used for aggregation from other sources
### In the kubernetes logging ecosystem ### In the Kubernetes logging ecosystem
* Pods logs to console -> Streamed stdout/err gets piped to file * Pod logs to console -> Streamed stdout/err gets piped to file
* The logs in the file get encoded as JSON with metadata (date, channel) * The logs in the file get encoded as JSON with metadata (date, channel)
* Labels and annotations only live in the control plane -> You have to collect it additionally -> Expensive * Labels and annotations only live in the control plane -> You have to collect it additionally -> Expensive
@ -56,8 +56,8 @@ flowchart LR
### Solution ### Solution
* Solution: Processor - a seperate thread segmented by telemetry type * Solution: Processor - a separate thread segmented by telemetry type
* Plugins can be written in your favourite language /c, rust, go, ...) * Plugins can be written in your favorite language (c, rust, go, ...)
```mermaid ```mermaid
flowchart LR flowchart LR
@ -74,7 +74,7 @@ flowchart LR
### General new features in v3 ### General new features in v3
* Native HTTP/2 support in core * Native HTTP/2 support in core
* Contetn modifier with multiple operations (insert, upsert, delete, rename, hash, extract, convert) * Content modifier with multiple operations (insert, upsert, delete, rename, hash, extract, convert)
* Metrics selector (include or exclude metrics) with matcher (name, prefix, substring, regex) * Metrics selector (include or exclude metrics) with matcher (name, prefix, substring, regex)
* SQL processor -> Use SQL expression for selections (instead of filters) * SQL processor -> Use SQL expression for selections (instead of filters)
* Better OpenTelemetry output * Better OpenTelemetry output

View File

@ -15,15 +15,15 @@ Who have I talked to today, are there any follow-ups or learnings?
They will follow up with a quick demo They will follow up with a quick demo
{{% /notice %}} {{% /notice %}}
* A interesting tektone-based CI/CD solutions that also integrates with oter platforms * An interesting tektone-based CI/CD solutions that also integrates with other platforms
* May be interesting for either ODIT or some of our customers * May be interesting for either ODIT.Services or some of our customers
## Docker ## Docker
* Talked to one salesperson just aboput the general conference * Talked to one salesperson just about the general conference
* Talked to one technical guy about docker buildtime optimization * Talked to one technical guy about docker build time optimization
## Rancher/Suse ## Rancher/SUSE
* I just got some swag, a friend of mine got a demo focussing on runtime security * I just got some swag, a friend of mine got a demo focussing on runtime security

View File

@ -4,7 +4,7 @@ title: Operators
## Observability ## Observability
* Export reconcile loop steps as opentelemetry traces * Export reconcile loop steps as OpenTelemetry traces
## Work queue ## Work queue

View File

@ -3,11 +3,11 @@ title: Flux
weight: 2 weight: 2
--- ---
Some lessonslearned from flux talsk and from talking to the flux team. Some lessons learned from flux talks and from talking to the flux team.
## Helm Autupdate ## Helm Auto-update
* Currently you can just use the normal image autoupdate machanism * Currently, you can just use the normal image auto-update mechanism
* Requirement: The helm chart is stored as a OCI-Artifact * Requirement: The helm chart is stored as an OCI-Artifact
* How: Just create the usual CRs and annotations * How: Just create the usual CRs and annotations
* They are also working on generalizing the autoupdate Process to fitt all OCI articacts (comming soon) * They are also working on generalizing the auto-update Process to fit all OCI artifacts (coming soon)

View File

@ -2,7 +2,7 @@
title: Check this out title: Check this out
--- ---
Just a loose list of stuff that souded interesting Just a loose list of stuff that sounded interesting
* Dapr * Dapr
* etcd backups * etcd backups

View File

@ -4,4 +4,4 @@ title: Lessons Learned
weight: 99 weight: 99
--- ---
Interesting lessons learned + tipps/tricks. Interesting lessons learned + tips/tricks.