day2 the next episode

This commit is contained in:
Nicolai Ort 2024-03-20 16:58:50 +01:00
parent 78b3826cbb
commit 33f615aaf0
Signed by: niggl
GPG Key ID: 13AFA55AF62F269F
8 changed files with 247 additions and 3 deletions

View File

@ -1,5 +1,5 @@
--- ---
title: Sponsored: Build an open source platform for ai/ml title: "Sponsored: Build an open source platform for ai/ml"
weight: 4 weight: 4
--- ---

View File

@ -1,6 +1,6 @@
--- ---
title: Is your image really distroless? title: Is your image really distroless?
weight:7 weight: 7
--- ---
Laurent Goderre from Docker. Laurent Goderre from Docker.

View File

@ -0,0 +1,98 @@
---
title: Building a large scale multi-cloud multi-region SaaS platform with kubernetes controllers
weight: 8
---
> Interchangeable wording in this talk: controller == operator
A talk by elastic.
## About elastic
* Elestic cloud as a managed service
* Deployed across AWS/GCP/Azure in over 50 regions
* 600.000+ Containers
### Elastic and Kube
* They offer elastic obervability
* They offer the ECK operator for simplified deployments
## The baseline
* Goal: A large scale (1M+ containers resilient platform on k8s
* Architecture
* Global Control: The control plane (api) for users with controllers
* Regional Apps: The "shitload" of kubernetes clusters where the actual customer services live
## Scalability
* Challenge: How large can our cluster be, how many clusters do we need
* Problem: Only basic guidelines exist for that
* Decision: Horizontaly scale the number of clusters (5ßß-1K nodes each)
* Decision: Disposable clusters
* Throw away without data loss
* Single source of throuth is not cluster etcd but external -> No etcd backups needed
* Everything can be recreated any time
## Controllers
{{% notice style="note" %}}
I won't copy the explanations of operators/controllers in this notes
{{% /notice %}}
* Many different controllers, including (but not limited to)
* cluster controler: Register cluster to controller
* Project controller: Schedule user's project to cluster
* Product controllers (Elasticsearch, Kibana, etc.)
* Ingress/Certmanager
* Sometimes controllers depend on controllers -> potential complexity
* Pro:
* Resilient (Selfhealing)
* Level triggered (desired state vs procedure triggered)
* Simple reasoning when comparing desired state vs state machine
* Official controller runtime lib
* Workque: Automatic Dedup, Retry backoff and so on
## Global Controllers
* Basic operation
* Uses project config from Elastic cloud as the desired state
* The actual state is a k9s ressource in another cluster
* Challenge: Where is the source of thruth if the data is not stored in etc
* Solution: External datastore (postgres)
* Challenge: How do we sync the db sources to kubernetes
* Potential solutions: Replace etcd with the external db
* Chosen solution:
* The controllers don't use CRDs for storage, but they expose a webapi
* Reconciliation still now interacts with the external db and go channels (que) instead
* Then the CRs for the operators get created by the global controller
### Large scale
* Problem: Reconcile gets triggered for all objects on restart -> Make sure nothing gets missed and is used with the latest controller version
* Idea: Just create more workers for 100K+ Objects
* Problem: CPU go brrr and db gets overloaded
* Problem: If you create an item during restart, suddenly it is at the end of a 100Kü item work-queue
### Reconcile
* User-driven events are processed asap
* reconcole of everything should happen, bus with low prio slowly in the background
* Solution: Status: LastReconciledRevision (timestamp) get's compare to revision, if larger -> User change
* Prioritization: Just a custom event handler with the normal queue and a low prio
* Low Prio Queue: Just a queue that adds items to the normal work-queue with a rate limit
```mermaid
flowchart LR
low-->rl(ratelimit)
rl-->wq(work queue)
wq-->controller
high-->wq
```
## Related
* Argo for CI/CD
* Crossplane for cluster autoprovision

View File

@ -0,0 +1,85 @@
---
title: "Safety or usability: Why not both? Towards referential auth in k8s"
weight: 9
---
A talk by Google and Microsoft with the premise of bether auth in k8s.
## Baselines
* Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets
* Result: CVEs
* Example: Just use ingress, nginx, put in some lua code in the config and voila: Service account token
* Fix: No more fun
## Basic solutions
* Seperate Control (the controller) from data (the ingress)
* Namespace limited ingress
## Current state of cross namespace stuff
* Why: Reference tls cert for gateway api in the cert team'snamespace
* Why: Move all ingress configs to one namespace
* Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret)
* Gateway Solution:
* Gateway TLS secret ref includes a namespace
* ReferenceGrant pretty mutch allows referencing from X (Gatway) to Y (Secret)
* Limits:
* Has to be implemented via controllers
* The controllers still have readall - they just check if they are supposed to do this
## Goals
### Global
* Grant access to controller to only ressources relevant for them (using references and maybe class segmentation)
* Allow for safe cross namespace references
* Make it easy for api devs to adopt it
### Personas
* Alex API author
* Kai controller author
* Rohan Resource owner
### What our stakeholders want
* Alex: Define relationships via ReferencePatterns
* Kai: Specify controller identity (Serviceaccount), define relationship API
* Rohan: Define cross namespace references (aka ressource grants that allow access to their ressources)
## Result of the paper
### Architecture
* ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API
* ReferenceConsumer: Who (IOdentity) has access under which conditions?
* ReferenceGrant: Allow specific references
### POC
* Minimum access: You only get access if the grant is there AND the reference actually exists
* Their basic implementation works with the kube api
### Open questions
* Naming
* Make people adopt this
* What about namespace-scoped ReferenceConsumer
* Is there a need of RBAC verb support (not only read access)
## Alternative
* Idea: Just extend RBAC Roles with a selector (match labels, etc)
* Problems:
* Requires changes to kubernetes core auth
* Everything bus list and watch is a pain
* How do you handle AND vs OR selection
* Field selectors: They exist
* Benefits: Simple controller implementation
## Meanwhile
* Prefer tools that support isolatiobn between controller and dataplane
* Disable all non-needed features -> Especially scripting

34
content/day2/10_dev_ux.md Normal file
View File

@ -0,0 +1,34 @@
---
title: Developers Demand UX for K8s!
weight: 10
---
A talk by UX and software people at RedHat (Podman team).
The talk mainly followed the academic study process (aka this is the survey I did for my bachelors/masters thesis).
## Research
* User research Study including 11 devs and platform engineers over three months
* Focus was on an new podman desktop feature
* Experence range 2-3 years experience average (low no experience, high oldschool kube)
* 16 questions regarding environment, workflow, debugging and pain points
* Analysis: Affinity mapping
## Findings
* Where do I start when things are broken? -> There may be solutions, but devs don't know about them
* Network debugging is hard b/c many layers and problems occuring in between cni and infra are really hard -> Network topology issues are rare but hard
* YAML identation -> Tool support is needed for visualisation
* YAML validation -> Just use validation in dev and gitops
* YAML Cleanup -> Normalize YAML (order, anchors, etc) for easy diff
* Inadequate security analysis (too verbose, non-issues are warnings) -> Realtime insights (and during dev)
* Crash Loop -> Identify stuck containers, simple debug containers
* CLI vs GUI -> Enable eperience level oriented gui, Enhance intime troubleshooting
## General issues
* No direct fs access
* Multiple kubeconfigs
* SaaS is sometimes only provided on kube, which sounds like complexity
* Where do i begin my troubleshooting
* Interoperability/Fragility with updates

View File

@ -26,4 +26,29 @@ Who have I talked to today, are there any follow-ups or learnings?
They will follow up They will follow up
{{% /notice %}} {{% /notice %}}
* We mostly talked about traefik hub as an API-portal * We mostly talked about traefik hub as an API-portal
## Postman
* I asked them about their new cloud-only stuff: They will keep their direction
* The are also planning to work on info materials on why postman SaaS is not a big security risk
## Mattermost
{{% notice style="note" %}}
I should follow up
{{% /notice %}}
* I talked about our problems with the mattermost operator and was asked to get back to them with the errors
* They're currently migrating the mattermost cloud offering to arm - therefor arm support will be coming in the next months
* The mattermost guy had exactly the same problems with notifications and read/unread using element
## Vercel
* Nice guys, talked a bit about convincing customers to switch to the edge
* Also talked about policy validation
## Renovate
* The paid renovate offering now includes build failure estimation
* I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification

View File

@ -1,6 +1,7 @@
--- ---
archetype: chapter archetype: chapter
title: Day 2 title: Day 2
weight: 2
--- ---
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon). Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).

View File

@ -5,3 +5,4 @@ title: Check this out
Just a loose list of stuff that souded interesting Just a loose list of stuff that souded interesting
* Dapr * Dapr
* etcd backups