Compare commits
No commits in common. "00d8ae29c4569396be5b605cdd71b095c0af5531" and "78b3826cbb5d75ccb9c3cc0017888ae5cbb0a7fa" have entirely different histories.
00d8ae29c4
...
78b3826cbb
@ -1,5 +1,5 @@
|
|||||||
---
|
---
|
||||||
title: "Sponsored: Build an open source platform for ai/ml"
|
title: Sponsored: Build an open source platform for ai/ml
|
||||||
weight: 4
|
weight: 4
|
||||||
---
|
---
|
||||||
|
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
title: Is your image really distroless?
|
title: Is your image really distroless?
|
||||||
weight: 7
|
weight:7
|
||||||
---
|
---
|
||||||
|
|
||||||
Laurent Goderre from Docker.
|
Laurent Goderre from Docker.
|
||||||
|
@ -1,98 +0,0 @@
|
|||||||
---
|
|
||||||
title: Building a large scale multi-cloud multi-region SaaS platform with kubernetes controllers
|
|
||||||
weight: 8
|
|
||||||
---
|
|
||||||
|
|
||||||
> Interchangeable wording in this talk: controller == operator
|
|
||||||
|
|
||||||
A talk by elastic.
|
|
||||||
|
|
||||||
## About elastic
|
|
||||||
|
|
||||||
* Elestic cloud as a managed service
|
|
||||||
* Deployed across AWS/GCP/Azure in over 50 regions
|
|
||||||
* 600.000+ Containers
|
|
||||||
|
|
||||||
### Elastic and Kube
|
|
||||||
|
|
||||||
* They offer elastic obervability
|
|
||||||
* They offer the ECK operator for simplified deployments
|
|
||||||
|
|
||||||
## The baseline
|
|
||||||
|
|
||||||
* Goal: A large scale (1M+ containers resilient platform on k8s
|
|
||||||
* Architecture
|
|
||||||
* Global Control: The control plane (api) for users with controllers
|
|
||||||
* Regional Apps: The "shitload" of kubernetes clusters where the actual customer services live
|
|
||||||
|
|
||||||
## Scalability
|
|
||||||
|
|
||||||
* Challenge: How large can our cluster be, how many clusters do we need
|
|
||||||
* Problem: Only basic guidelines exist for that
|
|
||||||
* Decision: Horizontaly scale the number of clusters (5ßß-1K nodes each)
|
|
||||||
* Decision: Disposable clusters
|
|
||||||
* Throw away without data loss
|
|
||||||
* Single source of throuth is not cluster etcd but external -> No etcd backups needed
|
|
||||||
* Everything can be recreated any time
|
|
||||||
|
|
||||||
## Controllers
|
|
||||||
|
|
||||||
{{% notice style="note" %}}
|
|
||||||
I won't copy the explanations of operators/controllers in this notes
|
|
||||||
{{% /notice %}}
|
|
||||||
|
|
||||||
|
|
||||||
* Many different controllers, including (but not limited to)
|
|
||||||
* cluster controler: Register cluster to controller
|
|
||||||
* Project controller: Schedule user's project to cluster
|
|
||||||
* Product controllers (Elasticsearch, Kibana, etc.)
|
|
||||||
* Ingress/Certmanager
|
|
||||||
* Sometimes controllers depend on controllers -> potential complexity
|
|
||||||
* Pro:
|
|
||||||
* Resilient (Selfhealing)
|
|
||||||
* Level triggered (desired state vs procedure triggered)
|
|
||||||
* Simple reasoning when comparing desired state vs state machine
|
|
||||||
* Official controller runtime lib
|
|
||||||
* Workque: Automatic Dedup, Retry backoff and so on
|
|
||||||
|
|
||||||
## Global Controllers
|
|
||||||
|
|
||||||
* Basic operation
|
|
||||||
* Uses project config from Elastic cloud as the desired state
|
|
||||||
* The actual state is a k9s ressource in another cluster
|
|
||||||
* Challenge: Where is the source of thruth if the data is not stored in etc
|
|
||||||
* Solution: External datastore (postgres)
|
|
||||||
* Challenge: How do we sync the db sources to kubernetes
|
|
||||||
* Potential solutions: Replace etcd with the external db
|
|
||||||
* Chosen solution:
|
|
||||||
* The controllers don't use CRDs for storage, but they expose a webapi
|
|
||||||
* Reconciliation still now interacts with the external db and go channels (que) instead
|
|
||||||
* Then the CRs for the operators get created by the global controller
|
|
||||||
|
|
||||||
### Large scale
|
|
||||||
|
|
||||||
* Problem: Reconcile gets triggered for all objects on restart -> Make sure nothing gets missed and is used with the latest controller version
|
|
||||||
* Idea: Just create more workers for 100K+ Objects
|
|
||||||
* Problem: CPU go brrr and db gets overloaded
|
|
||||||
* Problem: If you create an item during restart, suddenly it is at the end of a 100Kü item work-queue
|
|
||||||
|
|
||||||
### Reconcile
|
|
||||||
|
|
||||||
* User-driven events are processed asap
|
|
||||||
* reconcole of everything should happen, bus with low prio slowly in the background
|
|
||||||
* Solution: Status: LastReconciledRevision (timestamp) get's compare to revision, if larger -> User change
|
|
||||||
* Prioritization: Just a custom event handler with the normal queue and a low prio
|
|
||||||
* Low Prio Queue: Just a queue that adds items to the normal work-queue with a rate limit
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
flowchart LR
|
|
||||||
low-->rl(ratelimit)
|
|
||||||
rl-->wq(work queue)
|
|
||||||
wq-->controller
|
|
||||||
high-->wq
|
|
||||||
```
|
|
||||||
|
|
||||||
## Related
|
|
||||||
|
|
||||||
* Argo for CI/CD
|
|
||||||
* Crossplane for cluster autoprovision
|
|
@ -1,85 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Safety or usability: Why not both? Towards referential auth in k8s"
|
|
||||||
weight: 9
|
|
||||||
---
|
|
||||||
|
|
||||||
A talk by Google and Microsoft with the premise of bether auth in k8s.
|
|
||||||
|
|
||||||
## Baselines
|
|
||||||
|
|
||||||
* Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets
|
|
||||||
* Result: CVEs
|
|
||||||
* Example: Just use ingress, nginx, put in some lua code in the config and voila: Service account token
|
|
||||||
* Fix: No more fun
|
|
||||||
|
|
||||||
## Basic solutions
|
|
||||||
|
|
||||||
* Seperate Control (the controller) from data (the ingress)
|
|
||||||
* Namespace limited ingress
|
|
||||||
|
|
||||||
## Current state of cross namespace stuff
|
|
||||||
|
|
||||||
* Why: Reference tls cert for gateway api in the cert team'snamespace
|
|
||||||
* Why: Move all ingress configs to one namespace
|
|
||||||
* Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret)
|
|
||||||
* Gateway Solution:
|
|
||||||
* Gateway TLS secret ref includes a namespace
|
|
||||||
* ReferenceGrant pretty mutch allows referencing from X (Gatway) to Y (Secret)
|
|
||||||
* Limits:
|
|
||||||
* Has to be implemented via controllers
|
|
||||||
* The controllers still have readall - they just check if they are supposed to do this
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
### Global
|
|
||||||
|
|
||||||
* Grant access to controller to only ressources relevant for them (using references and maybe class segmentation)
|
|
||||||
* Allow for safe cross namespace references
|
|
||||||
* Make it easy for api devs to adopt it
|
|
||||||
|
|
||||||
### Personas
|
|
||||||
|
|
||||||
* Alex API author
|
|
||||||
* Kai controller author
|
|
||||||
* Rohan Resource owner
|
|
||||||
|
|
||||||
### What our stakeholders want
|
|
||||||
|
|
||||||
* Alex: Define relationships via ReferencePatterns
|
|
||||||
* Kai: Specify controller identity (Serviceaccount), define relationship API
|
|
||||||
* Rohan: Define cross namespace references (aka ressource grants that allow access to their ressources)
|
|
||||||
|
|
||||||
## Result of the paper
|
|
||||||
|
|
||||||
### Architecture
|
|
||||||
|
|
||||||
* ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API
|
|
||||||
* ReferenceConsumer: Who (IOdentity) has access under which conditions?
|
|
||||||
* ReferenceGrant: Allow specific references
|
|
||||||
|
|
||||||
### POC
|
|
||||||
|
|
||||||
* Minimum access: You only get access if the grant is there AND the reference actually exists
|
|
||||||
* Their basic implementation works with the kube api
|
|
||||||
|
|
||||||
### Open questions
|
|
||||||
|
|
||||||
* Naming
|
|
||||||
* Make people adopt this
|
|
||||||
* What about namespace-scoped ReferenceConsumer
|
|
||||||
* Is there a need of RBAC verb support (not only read access)
|
|
||||||
|
|
||||||
## Alternative
|
|
||||||
|
|
||||||
* Idea: Just extend RBAC Roles with a selector (match labels, etc)
|
|
||||||
* Problems:
|
|
||||||
* Requires changes to kubernetes core auth
|
|
||||||
* Everything bus list and watch is a pain
|
|
||||||
* How do you handle AND vs OR selection
|
|
||||||
* Field selectors: They exist
|
|
||||||
* Benefits: Simple controller implementation
|
|
||||||
|
|
||||||
## Meanwhile
|
|
||||||
|
|
||||||
* Prefer tools that support isolatiobn between controller and dataplane
|
|
||||||
* Disable all non-needed features -> Especially scripting
|
|
@ -1,34 +0,0 @@
|
|||||||
---
|
|
||||||
title: Developers Demand UX for K8s!
|
|
||||||
weight: 10
|
|
||||||
---
|
|
||||||
|
|
||||||
A talk by UX and software people at RedHat (Podman team).
|
|
||||||
The talk mainly followed the academic study process (aka this is the survey I did for my bachelors/masters thesis).
|
|
||||||
|
|
||||||
## Research
|
|
||||||
|
|
||||||
* User research Study including 11 devs and platform engineers over three months
|
|
||||||
* Focus was on an new podman desktop feature
|
|
||||||
* Experence range 2-3 years experience average (low no experience, high oldschool kube)
|
|
||||||
* 16 questions regarding environment, workflow, debugging and pain points
|
|
||||||
* Analysis: Affinity mapping
|
|
||||||
|
|
||||||
## Findings
|
|
||||||
|
|
||||||
* Where do I start when things are broken? -> There may be solutions, but devs don't know about them
|
|
||||||
* Network debugging is hard b/c many layers and problems occuring in between cni and infra are really hard -> Network topology issues are rare but hard
|
|
||||||
* YAML identation -> Tool support is needed for visualisation
|
|
||||||
* YAML validation -> Just use validation in dev and gitops
|
|
||||||
* YAML Cleanup -> Normalize YAML (order, anchors, etc) for easy diff
|
|
||||||
* Inadequate security analysis (too verbose, non-issues are warnings) -> Realtime insights (and during dev)
|
|
||||||
* Crash Loop -> Identify stuck containers, simple debug containers
|
|
||||||
* CLI vs GUI -> Enable eperience level oriented gui, Enhance intime troubleshooting
|
|
||||||
|
|
||||||
## General issues
|
|
||||||
|
|
||||||
* No direct fs access
|
|
||||||
* Multiple kubeconfigs
|
|
||||||
* SaaS is sometimes only provided on kube, which sounds like complexity
|
|
||||||
* Where do i begin my troubleshooting
|
|
||||||
* Interoperability/Fragility with updates
|
|
@ -1,153 +0,0 @@
|
|||||||
---
|
|
||||||
title: Comparing sidecarless service mesh from cilium and istio
|
|
||||||
weight: 11
|
|
||||||
---
|
|
||||||
|
|
||||||
Global field CTO at Solo.io with a hint of servicemesh background.
|
|
||||||
|
|
||||||
## History
|
|
||||||
|
|
||||||
* LinkerD 1.X was the first moder servicemesh and basicly a opt-in serviceproxy
|
|
||||||
* Challenges: JVM (size), latencies, ...
|
|
||||||
|
|
||||||
### Why not node-proxy?
|
|
||||||
|
|
||||||
* Per-node resource consumption is unpredictable
|
|
||||||
* Per-node proxy must ensure fairness
|
|
||||||
* Blast radius is always the entire node
|
|
||||||
* Per-node proxy is a fresh attack vector
|
|
||||||
|
|
||||||
### Why sidecar?
|
|
||||||
|
|
||||||
* Transparent (ish)
|
|
||||||
* PArt of app lifecycle (up/down)
|
|
||||||
* Single tennant
|
|
||||||
* No noisy neighbor
|
|
||||||
|
|
||||||
### Sidecar drawbacks
|
|
||||||
|
|
||||||
* Race conditions
|
|
||||||
* Security of certs/keys
|
|
||||||
* Difficult sizing
|
|
||||||
* Apps need to be proxy aware
|
|
||||||
* Can be circumvented
|
|
||||||
* Challenging upgrades (infra and app live side by side)
|
|
||||||
|
|
||||||
## Our lord and savior
|
|
||||||
|
|
||||||
* Potential solution: eBPF
|
|
||||||
* Problem: Not quite the perfect solution
|
|
||||||
* Result: We still need a L7 proxy (but some L4 stuff can be implemented in kernel)
|
|
||||||
|
|
||||||
### Why sidecarless
|
|
||||||
|
|
||||||
* Full transparency
|
|
||||||
* Optimized networking
|
|
||||||
* Lower ressource allocation
|
|
||||||
* No race conditions
|
|
||||||
* No manual pod injection
|
|
||||||
* No credentials in the app
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
* Control Plane
|
|
||||||
* Data Plane
|
|
||||||
* mTLS
|
|
||||||
* Observability
|
|
||||||
* Traffic Control
|
|
||||||
|
|
||||||
## Cilium
|
|
||||||
|
|
||||||
### Basics
|
|
||||||
|
|
||||||
* CNI with eBPF on L3/4
|
|
||||||
* A lot of nice observability
|
|
||||||
* Kubeproxy replacement
|
|
||||||
* Ingress (via Gateway API)
|
|
||||||
* Mutual Authentication
|
|
||||||
* Specialiced CiliumNetworkPolicy
|
|
||||||
* Configure Envoy throgh Cilium
|
|
||||||
|
|
||||||
### Control Plane
|
|
||||||
|
|
||||||
* Cilium-Agent on each node that reacts to scheduled workloads by programming the local dataplane
|
|
||||||
* API via Gateway API and CiliumNetworkPolicy
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
flowchart TD
|
|
||||||
subgraph kubeserver
|
|
||||||
kubeapi
|
|
||||||
end
|
|
||||||
subgraph node1
|
|
||||||
kubeapi<-->control1
|
|
||||||
control1-->data1
|
|
||||||
end
|
|
||||||
subgraph node2
|
|
||||||
kubeapi<-->control2
|
|
||||||
control2-->data2
|
|
||||||
end
|
|
||||||
subgraph node3
|
|
||||||
kubeapi<-->control3
|
|
||||||
control3-->data3
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
### Data plane
|
|
||||||
|
|
||||||
* Configured by control plane
|
|
||||||
* Does all of the eBPF things in L4
|
|
||||||
* Does all of the envoy things in L7
|
|
||||||
* In-Kernel Wireguard for optional transparent encryption
|
|
||||||
|
|
||||||
### mTLS
|
|
||||||
|
|
||||||
* Network Policies get applied at the eBPF layer (check if id a can talk to id 2)
|
|
||||||
* When mTLS is enabled there is a auth check in advance -> It it fails, proceed with agents
|
|
||||||
* Agents talk to each other for mTLS Auth and save the result to a cache -> Now ebpf can say yes
|
|
||||||
* Problems: The caches can lead to id confusion
|
|
||||||
|
|
||||||
## Istio
|
|
||||||
|
|
||||||
### Basiscs
|
|
||||||
|
|
||||||
* L4/7 Service mesh without it's own CNI
|
|
||||||
* Based on envoy
|
|
||||||
* mTLS
|
|
||||||
* Classicly via sidecar, nowadays
|
|
||||||
|
|
||||||
### Ambient mode
|
|
||||||
|
|
||||||
* Seperate L4 and L7 -> Can run on cilium
|
|
||||||
* mTLS
|
|
||||||
* Gateway API
|
|
||||||
|
|
||||||
### Control plane
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
flowchart TD
|
|
||||||
kubeapi-->xDS
|
|
||||||
|
|
||||||
xDS-->dataplane1
|
|
||||||
xDS-->dataplane2
|
|
||||||
|
|
||||||
subgraph node1
|
|
||||||
dataplane1
|
|
||||||
end
|
|
||||||
|
|
||||||
subgraph node2
|
|
||||||
dataplane2
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
* Central xDS Control Plane
|
|
||||||
* Per-Node Dataplane that reads updates from Control Plane
|
|
||||||
|
|
||||||
### Data Plane
|
|
||||||
|
|
||||||
* L4 runs via zTunnel Daemonset that handels mTLS
|
|
||||||
* The zTunnel traffic get's handed over to the CNI
|
|
||||||
* L7 Proxy lives somewhere™ and traffic get's routed through it as an "extra hop" aka waypoint
|
|
||||||
|
|
||||||
### mTLS
|
|
||||||
|
|
||||||
* The zTunnel creates a HBONE (http overlay network) tunnel with mTLS
|
|
@ -27,28 +27,3 @@ They will follow up
|
|||||||
{{% /notice %}}
|
{{% /notice %}}
|
||||||
|
|
||||||
* We mostly talked about traefik hub as an API-portal
|
* We mostly talked about traefik hub as an API-portal
|
||||||
|
|
||||||
## Postman
|
|
||||||
|
|
||||||
* I asked them about their new cloud-only stuff: They will keep their direction
|
|
||||||
* The are also planning to work on info materials on why postman SaaS is not a big security risk
|
|
||||||
|
|
||||||
## Mattermost
|
|
||||||
|
|
||||||
{{% notice style="note" %}}
|
|
||||||
I should follow up
|
|
||||||
{{% /notice %}}
|
|
||||||
|
|
||||||
* I talked about our problems with the mattermost operator and was asked to get back to them with the errors
|
|
||||||
* They're currently migrating the mattermost cloud offering to arm - therefor arm support will be coming in the next months
|
|
||||||
* The mattermost guy had exactly the same problems with notifications and read/unread using element
|
|
||||||
|
|
||||||
## Vercel
|
|
||||||
|
|
||||||
* Nice guys, talked a bit about convincing customers to switch to the edge
|
|
||||||
* Also talked about policy validation
|
|
||||||
|
|
||||||
## Renovate
|
|
||||||
|
|
||||||
* The paid renovate offering now includes build failure estimation
|
|
||||||
* I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification
|
|
||||||
|
@ -1,7 +1,6 @@
|
|||||||
---
|
---
|
||||||
archetype: chapter
|
archetype: chapter
|
||||||
title: Day 2
|
title: Day 2
|
||||||
weight: 2
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
|
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
|
||||||
|
@ -5,4 +5,3 @@ title: Check this out
|
|||||||
Just a loose list of stuff that souded interesting
|
Just a loose list of stuff that souded interesting
|
||||||
|
|
||||||
* Dapr
|
* Dapr
|
||||||
* etcd backups
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user