9 changed files with 3 additions and 400 deletions
--- a/content/day2/04_sponsored_ai_platform.md
+++ b/content/day2/04_sponsored_ai_platform.md
@ -1,5 +1,5 @@
 ---
-title: "Sponsored: Build an open source platform for ai/ml"
+title: Sponsored: Build an open source platform for ai/ml
 weight: 4
 ---
--- a/content/day2/07_is_your_image_distroless.md
+++ b/content/day2/07_is_your_image_distroless.md
@ -1,6 +1,6 @@
 ---
 title: Is your image really distroless?
-weight: 7
+weight:7
 ---
 Laurent Goderre from Docker.
--- a/content/day2/08_multicloud_saas.md
+++ b/content/day2/08_multicloud_saas.md
@ -1,98 +0,0 @@
 ---
 title: Building a large scale multi-cloud multi-region SaaS platform with kubernetes controllers
 weight: 8
 ---
 > Interchangeable wording in this talk: controller == operator
 A talk by elastic.
 ## About elastic
 * Elestic cloud as a managed service
 * Deployed across AWS/GCP/Azure in over 50 regions
 * 600.000+ Containers
 ### Elastic and Kube
 * They offer elastic obervability
 * They offer the ECK operator for simplified deployments
 ## The baseline
 * Goal: A large scale (1M+ containers resilient platform on k8s
 * Architecture
  * Global Control: The control plane (api) for users with controllers
  * Regional Apps: The "shitload" of kubernetes clusters where the actual customer services live
 ## Scalability
 * Challenge: How large can our cluster be, how many clusters do we need
 * Problem: Only basic guidelines exist for that
 * Decision: Horizontaly scale the number of clusters (5ßß-1K nodes each)
 * Decision: Disposable clusters
  * Throw away without data loss
  * Single source of throuth is not cluster etcd but external -> No etcd backups needed
  * Everything can be recreated any time
 ## Controllers
 {{% notice style="note" %}}
 I won't copy the explanations of operators/controllers in this notes
 {{% /notice %}}
 * Many different controllers, including (but not limited to)
  * cluster controler: Register cluster to controller
  * Project controller: Schedule user's project to cluster
  * Product controllers (Elasticsearch, Kibana, etc.)
  * Ingress/Certmanager
 * Sometimes controllers depend on controllers -> potential complexity
 * Pro:
  * Resilient (Selfhealing)
  * Level triggered (desired state vs procedure triggered)
  * Simple reasoning when comparing desired state vs state machine
  * Official controller runtime lib
 * Workque: Automatic Dedup, Retry backoff and so on
 ## Global Controllers
 * Basic operation
  * Uses project config from Elastic cloud as the desired state
  * The actual state is a k9s ressource in another cluster
 * Challenge: Where is the source of thruth if the data is not stored in etc
 * Solution: External datastore (postgres)
 * Challenge: How do we sync the db sources to kubernetes
 * Potential solutions: Replace etcd with the external db
 * Chosen solution:
  * The controllers don't use CRDs for storage, but they expose a webapi
  * Reconciliation still now interacts with the external db and go channels (que) instead 
  * Then the CRs for the operators get created by the global controller
 ### Large scale
 * Problem: Reconcile gets triggered for all objects on restart -> Make sure nothing gets missed and is used with the latest controller version
 * Idea: Just create more workers for 100K+ Objects
 * Problem: CPU go brrr and db gets overloaded
 * Problem: If you create an item during restart, suddenly it is at the end of a 100Kü item work-queue
 ### Reconcile
 * User-driven events are processed asap
 * reconcole of everything should happen, bus with low prio slowly in the background
 * Solution: Status: LastReconciledRevision (timestamp) get's compare to revision, if larger -> User change
 * Prioritization: Just a custom event handler with the normal queue and a low prio
 * Low Prio Queue: Just a queue that adds items to the normal work-queue with a rate limit
 ```mermaid
 flowchart LR
    low-->rl(ratelimit)
    rl-->wq(work queue)
    wq-->controller
    high-->wq
 ```
 ## Related
 * Argo for CI/CD
 * Crossplane for cluster autoprovision
--- a/content/day2/09_safety_usability_auth.md
+++ b/content/day2/09_safety_usability_auth.md
@ -1,85 +0,0 @@
 ---
 title: "Safety or usability: Why not both? Towards referential auth in k8s"
 weight: 9
 ---
 A talk by Google and Microsoft with the premise of bether auth in k8s.
 ## Baselines
 * Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets
 * Result: CVEs
 * Example: Just use ingress, nginx, put in some lua code in the config and voila: Service account token
 * Fix: No more fun
 ## Basic solutions
 * Seperate Control (the controller) from data (the ingress)
 * Namespace limited ingress
 ## Current state of cross namespace stuff
 * Why: Reference tls cert for gateway api in the cert team'snamespace
 * Why: Move all ingress configs to one namespace
 * Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret)
 * Gateway Solution:
  * Gateway TLS secret ref includes a namespace
  * ReferenceGrant pretty mutch allows referencing from X (Gatway) to Y (Secret)
 * Limits: 
  * Has to be implemented via controllers
  * The controllers still have readall - they just check if they are supposed to do this
 ## Goals
 ### Global
 * Grant access to controller to only ressources relevant for them (using references and maybe class segmentation)
 * Allow for safe cross namespace references
 * Make it easy for api devs to adopt it
 ### Personas
 * Alex API author
 * Kai controller author
 * Rohan Resource owner
 ### What our stakeholders want
 * Alex: Define relationships via ReferencePatterns
 * Kai: Specify controller identity (Serviceaccount), define relationship API
 * Rohan: Define cross namespace references (aka ressource grants that allow access to their ressources)
 ## Result of the paper
 ### Architecture
 * ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API
 * ReferenceConsumer: Who (IOdentity) has access under which conditions?
 * ReferenceGrant: Allow specific references
 ### POC
 * Minimum access: You only get access if the grant is there AND the reference actually exists
 * Their basic implementation works with the kube api
 ### Open questions
 * Naming
 * Make people adopt this
 * What about namespace-scoped ReferenceConsumer
 * Is there a need of RBAC verb support (not only read access)
 ## Alternative
 * Idea: Just extend RBAC Roles with a selector (match labels, etc)
 * Problems:
  * Requires changes to kubernetes core auth
  * Everything bus list and watch is a pain
  * How do you handle AND vs OR selection
  * Field selectors: They exist
 * Benefits: Simple controller implementation
 ## Meanwhile
 * Prefer tools that support isolatiobn between controller and dataplane
 * Disable all non-needed features -> Especially scripting
--- a/content/day2/10_dev_ux.md
+++ b/content/day2/10_dev_ux.md
@ -1,34 +0,0 @@
 ---
 title: Developers Demand UX for K8s!
 weight: 10
 ---
 A talk by UX and software people at RedHat (Podman team).
 The talk mainly followed the academic study process (aka this is the survey I did for my bachelors/masters thesis).
 ## Research
 * User research Study including 11 devs and platform engineers over three months
 * Focus was on an new podman desktop feature
 * Experence range 2-3 years experience average (low no experience, high oldschool kube)
 * 16 questions regarding environment, workflow, debugging and pain points
 * Analysis: Affinity mapping
 ## Findings
 * Where do I start when things are broken? -> There may be solutions, but devs don't know about them
 * Network debugging is hard b/c many layers and problems occuring in between cni and infra are really hard -> Network topology issues are rare but hard
 * YAML identation -> Tool support is needed for visualisation
 * YAML validation -> Just use validation in dev and gitops
 * YAML Cleanup -> Normalize YAML (order, anchors, etc) for easy diff
 * Inadequate security analysis (too verbose, non-issues are warnings) -> Realtime insights (and during dev)
 * Crash Loop -> Identify stuck containers, simple debug containers
 * CLI vs GUI -> Enable eperience level oriented gui, Enhance intime troubleshooting
 ## General issues
 * No direct fs access
 * Multiple kubeconfigs
 * SaaS is sometimes only provided on kube, which sounds like complexity
 * Where do i begin my troubleshooting
 * Interoperability/Fragility with updates
--- a/content/day2/11_sidecarless.md
+++ b/content/day2/11_sidecarless.md
@ -1,153 +0,0 @@
 ---
 title: Comparing sidecarless service mesh from cilium and istio
 weight: 11
 ---
 Global field CTO at Solo.io with a hint of servicemesh background.
 ## History
 * LinkerD 1.X was the first moder servicemesh and basicly a opt-in serviceproxy
 * Challenges: JVM (size), latencies, ...
 ### Why not node-proxy?
 * Per-node resource consumption is unpredictable
 * Per-node proxy must ensure fairness
 * Blast radius is always the entire node
 * Per-node proxy is a fresh attack vector
 ### Why sidecar?
 * Transparent (ish)
 * PArt of app lifecycle (up/down)
 * Single tennant
 * No noisy neighbor
 ### Sidecar drawbacks
 * Race conditions
 * Security of certs/keys
 * Difficult sizing
 * Apps need to be proxy aware
 * Can be circumvented
 * Challenging upgrades (infra and app live side by side)
 ## Our lord and savior
 * Potential solution: eBPF
 * Problem: Not quite the perfect solution
 * Result: We still need a L7 proxy (but some L4 stuff can be implemented in kernel)
 ### Why sidecarless
 * Full transparency
 * Optimized networking
 * Lower ressource allocation
 * No race conditions
 * No manual pod injection
 * No credentials in the app
 ## Architecture
 * Control Plane
 * Data Plane
 * mTLS
 * Observability
 * Traffic Control
 ## Cilium
 ### Basics
 * CNI with eBPF on L3/4
 * A lot of nice observability
 * Kubeproxy replacement
 * Ingress (via Gateway API)
 * Mutual Authentication
 * Specialiced CiliumNetworkPolicy
 * Configure Envoy throgh Cilium
 ### Control Plane
 * Cilium-Agent on each node that reacts to scheduled workloads by programming the local dataplane
 * API via Gateway API and CiliumNetworkPolicy
 ```mermaid
 flowchart TD
    subgraph kubeserver
        kubeapi
    end
    subgraph node1
        kubeapi<-->control1
        control1-->data1
    end
    subgraph node2
        kubeapi<-->control2
        control2-->data2
    end
    subgraph node3
        kubeapi<-->control3
        control3-->data3
    end
 ```
 ### Data plane
 * Configured by control plane
 * Does all of the eBPF things in L4
 * Does all of the envoy things in L7
 * In-Kernel Wireguard for optional transparent encryption
 ### mTLS
 * Network Policies get applied at the eBPF layer (check if id a can talk to id 2)
 * When mTLS is enabled there is a auth check in advance -> It it fails, proceed with agents
 * Agents talk to each other for mTLS Auth and save the result to a cache -> Now ebpf can say yes
 * Problems: The caches can lead to id confusion
 ## Istio
 ### Basiscs
 * L4/7 Service mesh without it's own CNI
 * Based on envoy
 * mTLS
 * Classicly via sidecar, nowadays
 ### Ambient mode
 * Seperate L4 and L7 -> Can run on cilium
 * mTLS
 * Gateway API
 ### Control plane
 ```mermaid
 flowchart TD
    kubeapi-->xDS
    xDS-->dataplane1
    xDS-->dataplane2
    subgraph node1
        dataplane1
    end
    subgraph node2
        dataplane2
    end
 ```
 * Central xDS Control Plane
 * Per-Node Dataplane that reads updates from Control Plane
 ### Data Plane
 * L4 runs via zTunnel Daemonset that handels mTLS
 * The zTunnel traffic get's handed over to the CNI
 * L7 Proxy lives somewhere™ and traffic get's routed through it as an "extra hop" aka waypoint
 ### mTLS
 * The zTunnel creates a HBONE (http overlay network) tunnel with mTLS
--- a/content/day2/99_networking.md
+++ b/content/day2/99_networking.md
@ -27,28 +27,3 @@ They will follow up
 {{% /notice %}}
 * We mostly talked about traefik hub as an API-portal
 ## Postman
 * I asked them about their new cloud-only stuff: They will keep their direction
 * The are also planning to work on info materials on why postman SaaS is not a big security risk
 ## Mattermost
 {{% notice style="note" %}}
 I should follow up
 {{% /notice %}}
 * I talked about our problems with the mattermost operator and was asked to get back to them with the errors
 * They're currently migrating the mattermost cloud offering to arm - therefor arm support will be coming in the next months
 * The mattermost guy had exactly the same problems with notifications and read/unread using element
 ## Vercel
 * Nice guys, talked a bit about convincing customers to switch to the edge
 * Also talked about policy validation
 ## Renovate
 * The paid renovate offering now includes build failure estimation
 * I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification 
--- a/content/day2/_index.md
+++ b/content/day2/_index.md
@ -1,7 +1,6 @@
 ---
 archetype: chapter 
 title: Day 2
 weight: 2
 ---
 Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
--- a/content/lessons_learned/99_checkout.md
+++ b/content/lessons_learned/99_checkout.md
@ -5,4 +5,3 @@ title: Check this out
 Just a loose list of stuff that souded interesting
 * Dapr
 * etcd backups