day2 the next episode

This commit is contained in:
Nicolai Ort 2024-03-20 16:58:50 +01:00
parent 78b3826cbb
commit 33f615aaf0
Signed by: niggl
GPG Key ID: 13AFA55AF62F269F
8 changed files with 247 additions and 3 deletions

View File

@ -1,5 +1,5 @@
---
title: Sponsored: Build an open source platform for ai/ml
title: "Sponsored: Build an open source platform for ai/ml"
weight: 4
---

View File

@ -1,6 +1,6 @@
---
title: Is your image really distroless?
weight:7
weight: 7
---
Laurent Goderre from Docker.

View File

@ -0,0 +1,98 @@
---
title: Building a large scale multi-cloud multi-region SaaS platform with kubernetes controllers
weight: 8
---
> Interchangeable wording in this talk: controller == operator
A talk by elastic.
## About elastic
* Elestic cloud as a managed service
* Deployed across AWS/GCP/Azure in over 50 regions
* 600.000+ Containers
### Elastic and Kube
* They offer elastic obervability
* They offer the ECK operator for simplified deployments
## The baseline
* Goal: A large scale (1M+ containers resilient platform on k8s
* Architecture
* Global Control: The control plane (api) for users with controllers
* Regional Apps: The "shitload" of kubernetes clusters where the actual customer services live
## Scalability
* Challenge: How large can our cluster be, how many clusters do we need
* Problem: Only basic guidelines exist for that
* Decision: Horizontaly scale the number of clusters (5ßß-1K nodes each)
* Decision: Disposable clusters
* Throw away without data loss
* Single source of throuth is not cluster etcd but external -> No etcd backups needed
* Everything can be recreated any time
## Controllers
{{% notice style="note" %}}
I won't copy the explanations of operators/controllers in this notes
{{% /notice %}}
* Many different controllers, including (but not limited to)
* cluster controler: Register cluster to controller
* Project controller: Schedule user's project to cluster
* Product controllers (Elasticsearch, Kibana, etc.)
* Ingress/Certmanager
* Sometimes controllers depend on controllers -> potential complexity
* Pro:
* Resilient (Selfhealing)
* Level triggered (desired state vs procedure triggered)
* Simple reasoning when comparing desired state vs state machine
* Official controller runtime lib
* Workque: Automatic Dedup, Retry backoff and so on
## Global Controllers
* Basic operation
* Uses project config from Elastic cloud as the desired state
* The actual state is a k9s ressource in another cluster
* Challenge: Where is the source of thruth if the data is not stored in etc
* Solution: External datastore (postgres)
* Challenge: How do we sync the db sources to kubernetes
* Potential solutions: Replace etcd with the external db
* Chosen solution:
* The controllers don't use CRDs for storage, but they expose a webapi
* Reconciliation still now interacts with the external db and go channels (que) instead
* Then the CRs for the operators get created by the global controller
### Large scale
* Problem: Reconcile gets triggered for all objects on restart -> Make sure nothing gets missed and is used with the latest controller version
* Idea: Just create more workers for 100K+ Objects
* Problem: CPU go brrr and db gets overloaded
* Problem: If you create an item during restart, suddenly it is at the end of a 100Kü item work-queue
### Reconcile
* User-driven events are processed asap
* reconcole of everything should happen, bus with low prio slowly in the background
* Solution: Status: LastReconciledRevision (timestamp) get's compare to revision, if larger -> User change
* Prioritization: Just a custom event handler with the normal queue and a low prio
* Low Prio Queue: Just a queue that adds items to the normal work-queue with a rate limit
```mermaid
flowchart LR
low-->rl(ratelimit)
rl-->wq(work queue)
wq-->controller
high-->wq
```
## Related
* Argo for CI/CD
* Crossplane for cluster autoprovision

View File

@ -0,0 +1,85 @@
---
title: "Safety or usability: Why not both? Towards referential auth in k8s"
weight: 9
---
A talk by Google and Microsoft with the premise of bether auth in k8s.
## Baselines
* Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets
* Result: CVEs
* Example: Just use ingress, nginx, put in some lua code in the config and voila: Service account token
* Fix: No more fun
## Basic solutions
* Seperate Control (the controller) from data (the ingress)
* Namespace limited ingress
## Current state of cross namespace stuff
* Why: Reference tls cert for gateway api in the cert team'snamespace
* Why: Move all ingress configs to one namespace
* Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret)
* Gateway Solution:
* Gateway TLS secret ref includes a namespace
* ReferenceGrant pretty mutch allows referencing from X (Gatway) to Y (Secret)
* Limits:
* Has to be implemented via controllers
* The controllers still have readall - they just check if they are supposed to do this
## Goals
### Global
* Grant access to controller to only ressources relevant for them (using references and maybe class segmentation)
* Allow for safe cross namespace references
* Make it easy for api devs to adopt it
### Personas
* Alex API author
* Kai controller author
* Rohan Resource owner
### What our stakeholders want
* Alex: Define relationships via ReferencePatterns
* Kai: Specify controller identity (Serviceaccount), define relationship API
* Rohan: Define cross namespace references (aka ressource grants that allow access to their ressources)
## Result of the paper
### Architecture
* ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API
* ReferenceConsumer: Who (IOdentity) has access under which conditions?
* ReferenceGrant: Allow specific references
### POC
* Minimum access: You only get access if the grant is there AND the reference actually exists
* Their basic implementation works with the kube api
### Open questions
* Naming
* Make people adopt this
* What about namespace-scoped ReferenceConsumer
* Is there a need of RBAC verb support (not only read access)
## Alternative
* Idea: Just extend RBAC Roles with a selector (match labels, etc)
* Problems:
* Requires changes to kubernetes core auth
* Everything bus list and watch is a pain
* How do you handle AND vs OR selection
* Field selectors: They exist
* Benefits: Simple controller implementation
## Meanwhile
* Prefer tools that support isolatiobn between controller and dataplane
* Disable all non-needed features -> Especially scripting

34
content/day2/10_dev_ux.md Normal file
View File

@ -0,0 +1,34 @@
---
title: Developers Demand UX for K8s!
weight: 10
---
A talk by UX and software people at RedHat (Podman team).
The talk mainly followed the academic study process (aka this is the survey I did for my bachelors/masters thesis).
## Research
* User research Study including 11 devs and platform engineers over three months
* Focus was on an new podman desktop feature
* Experence range 2-3 years experience average (low no experience, high oldschool kube)
* 16 questions regarding environment, workflow, debugging and pain points
* Analysis: Affinity mapping
## Findings
* Where do I start when things are broken? -> There may be solutions, but devs don't know about them
* Network debugging is hard b/c many layers and problems occuring in between cni and infra are really hard -> Network topology issues are rare but hard
* YAML identation -> Tool support is needed for visualisation
* YAML validation -> Just use validation in dev and gitops
* YAML Cleanup -> Normalize YAML (order, anchors, etc) for easy diff
* Inadequate security analysis (too verbose, non-issues are warnings) -> Realtime insights (and during dev)
* Crash Loop -> Identify stuck containers, simple debug containers
* CLI vs GUI -> Enable eperience level oriented gui, Enhance intime troubleshooting
## General issues
* No direct fs access
* Multiple kubeconfigs
* SaaS is sometimes only provided on kube, which sounds like complexity
* Where do i begin my troubleshooting
* Interoperability/Fragility with updates

View File

@ -26,4 +26,29 @@ Who have I talked to today, are there any follow-ups or learnings?
They will follow up
{{% /notice %}}
* We mostly talked about traefik hub as an API-portal
* We mostly talked about traefik hub as an API-portal
## Postman
* I asked them about their new cloud-only stuff: They will keep their direction
* The are also planning to work on info materials on why postman SaaS is not a big security risk
## Mattermost
{{% notice style="note" %}}
I should follow up
{{% /notice %}}
* I talked about our problems with the mattermost operator and was asked to get back to them with the errors
* They're currently migrating the mattermost cloud offering to arm - therefor arm support will be coming in the next months
* The mattermost guy had exactly the same problems with notifications and read/unread using element
## Vercel
* Nice guys, talked a bit about convincing customers to switch to the edge
* Also talked about policy validation
## Renovate
* The paid renovate offering now includes build failure estimation
* I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification

View File

@ -1,6 +1,7 @@
---
archetype: chapter
title: Day 2
weight: 2
---
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).

View File

@ -5,3 +5,4 @@ title: Check this out
Just a loose list of stuff that souded interesting
* Dapr
* etcd backups