Day 2 typos
This commit is contained in:
parent
e2e3b2fdf3
commit
7b1203c7a3
46
.vscode/ltex.dictionary.en-US.txt
vendored
46
.vscode/ltex.dictionary.en-US.txt
vendored
@ -36,3 +36,49 @@ multicluster
|
|||||||
Statefulset
|
Statefulset
|
||||||
eBPF
|
eBPF
|
||||||
Parca
|
Parca
|
||||||
|
KubeCon
|
||||||
|
FinOps
|
||||||
|
moondream
|
||||||
|
OLLAMA
|
||||||
|
LLVA
|
||||||
|
LLAVA
|
||||||
|
bokllava
|
||||||
|
NVLink
|
||||||
|
CUDA
|
||||||
|
Space-seperated
|
||||||
|
KAITO
|
||||||
|
Hugginface
|
||||||
|
LLMA
|
||||||
|
Alluxio
|
||||||
|
LLMs
|
||||||
|
onprem
|
||||||
|
Kube
|
||||||
|
Kubeflow
|
||||||
|
Ohly
|
||||||
|
distroless
|
||||||
|
init
|
||||||
|
Distroless
|
||||||
|
Buildkit
|
||||||
|
busybox
|
||||||
|
ECK
|
||||||
|
Kibana
|
||||||
|
Dedup
|
||||||
|
Crossplane
|
||||||
|
autoprovision
|
||||||
|
RBAC
|
||||||
|
Serviceaccount
|
||||||
|
CVEs
|
||||||
|
Podman
|
||||||
|
LinkerD
|
||||||
|
sidecarless
|
||||||
|
Kubeproxy
|
||||||
|
Daemonset
|
||||||
|
zTunnel
|
||||||
|
HBONE
|
||||||
|
Paketo
|
||||||
|
KORFI
|
||||||
|
Traefik
|
||||||
|
traefik
|
||||||
|
Vercel
|
||||||
|
Isovalent
|
||||||
|
CNIs
|
||||||
|
@ -6,7 +6,7 @@ tags:
|
|||||||
- opening
|
- opening
|
||||||
---
|
---
|
||||||
|
|
||||||
The opening keynote started - as is the tradition with keynotes - with an "motivational" opening video.
|
The opening keynote started - as is the tradition with keynotes - with a "motivational" opening video.
|
||||||
The keynote itself was presented by the CEO of the CNCF.
|
The keynote itself was presented by the CEO of the CNCF.
|
||||||
|
|
||||||
## The numbers
|
## The numbers
|
||||||
@ -17,7 +17,7 @@ The keynote itself was presented by the CEO of the CNCF.
|
|||||||
|
|
||||||
## The highlights
|
## The highlights
|
||||||
|
|
||||||
* Everyone uses cloudnative
|
* Everyone uses cloud native
|
||||||
* AI uses Kubernetes b/c the UX is way better than classic tools
|
* AI uses Kubernetes b/c the UX is way better than classic tools
|
||||||
* Especially when transferring from dev to prod
|
* Especially when transferring from dev to prod
|
||||||
* We need standardization
|
* We need standardization
|
||||||
@ -26,10 +26,10 @@ The keynote itself was presented by the CEO of the CNCF.
|
|||||||
## Live demo
|
## Live demo
|
||||||
|
|
||||||
* KIND cluster on desktop
|
* KIND cluster on desktop
|
||||||
* Protptype Stack (develop on client)
|
* Prototype Stack (develop on client)
|
||||||
* Kubernetes with the LLM
|
* Kubernetes with the LLM
|
||||||
* Host with LLVA (image describe model), moondream and OLLAMA (the model manager/registry()
|
* Host with LLAVA (image describe model), moondream and OLLAMA (the model manager/registry()
|
||||||
* Prod Stack (All in kube)
|
* Prod Stack (All in kube)
|
||||||
* Kubernetes with LLM, LLVA, OLLAMA, moondream
|
* Kubernetes with LLM, LLVA, OLLAMA, moondream
|
||||||
* Available Models: llava, mistral bokllava (llava*mistral)
|
* Available Models: LLAVA, mistral bokllava (LLAVA*mistral)
|
||||||
* Host takes picture, ai describes what is pictures (in our case the conference audience)
|
* Host takes picture, AI describes what is pictures (in our case the conference audience)
|
||||||
|
@ -7,7 +7,7 @@ tags:
|
|||||||
- panel
|
- panel
|
||||||
---
|
---
|
||||||
|
|
||||||
A podium discussion (somewhat scripted) lead by Pryanka
|
A podium discussion (somewhat scripted) lead by Priyanka
|
||||||
|
|
||||||
## Guests
|
## Guests
|
||||||
|
|
||||||
@ -17,24 +17,24 @@ A podium discussion (somewhat scripted) lead by Pryanka
|
|||||||
|
|
||||||
## Discussion
|
## Discussion
|
||||||
|
|
||||||
* What do you use as the base of dev for ollama
|
* What do you use as the base of dev for OLLAMA
|
||||||
* Jeff: The concepts from docker, git, kubernetes
|
* Jeff: The concepts from docker, git, Kubernetes
|
||||||
* How is the balance between ai engi and ai ops
|
* How is the balance between AI engineer and AI ops
|
||||||
* Jeff: The classic dev vs ops devide, many ML-Engi don't think about
|
* Jeff: The classic dev vs ops divide, many ML-Engineer don't think about
|
||||||
* Paige: Yessir
|
* Paige: Yessir
|
||||||
* How does infra keep up with the fast research
|
* How does infra keep up with the fast research
|
||||||
* Paige: Well, they don't - but they do their best and Cloudnative is cool
|
* Paige: Well, they don't - but they do their best and Cloud native is cool
|
||||||
* Jeff: Well we're not google, but kubernetes is the saviour
|
* Jeff: Well we're not google, but Kubernetes is the savior
|
||||||
* What are scaling constraints
|
* What are scaling constraints
|
||||||
* Jeff: Currently sizing of models is still in it's infancy
|
* Jeff: Currently sizing of models is still in its infancy
|
||||||
* Jeff: There will be more specific hardware and someone will have to support it
|
* Jeff: There will be more specific hardware and someone will have to support it
|
||||||
* Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization)
|
* Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization)
|
||||||
* Paige: Optimization of smaller models
|
* Paige: Optimization of smaller models
|
||||||
* What technologies need to be open source licensed
|
* What technologies need to be open source licensed
|
||||||
* Jeff: The model b/c access and trust
|
* Jeff: The model b/c access and trust
|
||||||
* Tim: The models and base execution environemtn -> Vendor agnosticism
|
* Tim: The models and base execution environment -> Vendor agnosticism
|
||||||
* Paige: Yes and remixes are really imporant for development
|
* Paige: Yes and remixes are really important for development
|
||||||
* Anything else
|
* Anything else
|
||||||
* Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world
|
* Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world
|
||||||
* Paige: Currently many people just use paid apis to abstract the infra, but we need this stuff selfhostable
|
* Paige: Currently many people just use paid APIs to abstract the infra, but we need this stuff self-hostable
|
||||||
* Tim: I don'T want to know about the hardware, the whole infra side should be done by the cloudnative teams to let ML-Engi to just be ML-Engine
|
* Tim: I don't want to know about the hardware, the whole infra side should be done by the cloud native teams to let ML-Engineer to just be ML-Engine
|
||||||
|
@ -9,7 +9,7 @@ tags:
|
|||||||
|
|
||||||
Kevin and Sanjay from NVIDIA
|
Kevin and Sanjay from NVIDIA
|
||||||
|
|
||||||
## Enabeling GPUs in Kubernetes today
|
## Enabling GPUs in Kubernetes today
|
||||||
|
|
||||||
* Host level components: Toolkit, drivers
|
* Host level components: Toolkit, drivers
|
||||||
* Kubernetes components: Device plugin, feature discovery, node selector
|
* Kubernetes components: Device plugin, feature discovery, node selector
|
||||||
@ -18,24 +18,24 @@ Kevin and Sanjay from NVIDIA
|
|||||||
## GPU sharing
|
## GPU sharing
|
||||||
|
|
||||||
* Time slicing: Switch around by time
|
* Time slicing: Switch around by time
|
||||||
* Multi Process Service: Run allways on the GPU but share (space-)
|
* Multi Process Service: Always run on the GPU but share (space-)
|
||||||
* Multi Instance GPU: Space-seperated sharing on the hardware
|
* Multi Instance GPU: Space-seperated sharing on the hardware
|
||||||
* Virtual GPU: Virtualices Time slicing or MIG
|
* Virtual GPU: Virtualizes Time slicing or MIG
|
||||||
* CUDA Streams: Run multiple kernels in a single app
|
* CUDA Streams: Run multiple kernels in a single app
|
||||||
|
|
||||||
## Dynamic resource allocation
|
## Dynamic resource allocation
|
||||||
|
|
||||||
* A new alpha feature since Kube 1.26 for dynamic ressource requesting
|
* A new alpha feature since Kube 1.26 for dynamic resource requesting
|
||||||
* You just request a ressource via the API and have fun
|
* You just request a resource via the API and have fun
|
||||||
* The sharing itself is an implementation detail
|
* The sharing itself is an implementation detail
|
||||||
|
|
||||||
## GPU scale out challenges
|
## GPU scale-out challenges
|
||||||
|
|
||||||
* NVIDIA Picasso is a foundry for model creation powered by Kubernetes
|
* NVIDIA Picasso is a foundry for model creation powered by Kubernetes
|
||||||
* The workload is the training workload split into batches
|
* The workload is the training workload split into batches
|
||||||
* Challenge: Schedule multiple training jobs by different users that are prioritized
|
* Challenge: Schedule multiple training jobs by different users that are prioritized
|
||||||
|
|
||||||
### Topology aware placments
|
### Topology aware placements
|
||||||
|
|
||||||
* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
|
* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
|
||||||
* Target: optimize related jobs based on GPU node distance and NUMA placement
|
* Target: optimize related jobs based on GPU node distance and NUMA placement
|
||||||
@ -44,11 +44,11 @@ Kevin and Sanjay from NVIDIA
|
|||||||
|
|
||||||
* Stuff can break, resulting in slowdowns or errors
|
* Stuff can break, resulting in slowdowns or errors
|
||||||
* Challenge: Detect faults and handle them
|
* Challenge: Detect faults and handle them
|
||||||
* Observability both in-band and out ouf band that expose node conditions in kubernetes
|
* Observability both in-band and out of band that expose node conditions in Kubernetes
|
||||||
* Needed: Automated fault-tolerant scheduling
|
* Needed: Automated fault-tolerant scheduling
|
||||||
|
|
||||||
### Multi-dimensional optimization
|
### Multidimensional optimization
|
||||||
|
|
||||||
* There are different KPIs: starvation, prioprity, occupanccy, fainrness
|
* There are different KPIs: starvation, priority, occupancy, fairness
|
||||||
* Challenge: What to choose (the multi-dimensional decision problemn)
|
* Challenge: What to choose (the multidimensional decision problem)
|
||||||
* Needed: A scheduler that can balance the dimensions
|
* Needed: A scheduler that can balance the dimensions
|
||||||
|
@ -15,11 +15,11 @@ Jorge Palma from Microsoft with a quick introduction.
|
|||||||
* Containerized models
|
* Containerized models
|
||||||
* GPUs in the cluster (install, management)
|
* GPUs in the cluster (install, management)
|
||||||
|
|
||||||
## Kubernetes AI Toolchain (KAITO)
|
## Kubernetes AI Tool chain (KAITO)
|
||||||
|
|
||||||
* Kubernetes operator that interacts with
|
* Kubernetes operator that interacts with
|
||||||
* Node provisioner
|
* Node provisioner
|
||||||
* Deployment
|
* Deployment
|
||||||
* Simple CRD that decribes a model, infra and have fun
|
* Simple CRD that describes a model, infra and have fun
|
||||||
* Creates inferance endpoint
|
* Creates inference endpoint
|
||||||
* Models are currently 10 (Hugginface, LLMA, etc)
|
* Models are currently 10 (Hugginface, LLMA, etc.)
|
||||||
|
@ -6,14 +6,14 @@ tags:
|
|||||||
- panel
|
- panel
|
||||||
---
|
---
|
||||||
|
|
||||||
A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN.
|
A panel discussion with moderation by Google and participants from Google, Alluxio, Ampere and CERN.
|
||||||
It was pretty scripted with prepared (sponsor specific) slides for each question answered.
|
It was pretty scripted with prepared (sponsor specific) slides for each question answered.
|
||||||
|
|
||||||
## Takeaways
|
## Takeaways
|
||||||
|
|
||||||
* Deploying a ML should become the new deploy a web app
|
* Deploying an ML should become the new deployment a web app
|
||||||
* The hardware should be fully utilized -> Better ressource sharing and scheduling
|
* The hardware should be fully utilized -> Better resource sharing and scheduling
|
||||||
* Smaller LLMs on cpu only is preyy cost efficient
|
* Smaller LLMs on CPU only is pretty cost-efficient
|
||||||
* Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow
|
* Better scheduling by splitting into storage + CPU (prepare) and GPU (run) nodes to create a just-in-time flow
|
||||||
* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
|
* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
|
||||||
* We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads
|
* We should be flexible regarding hardware, multi-cluster workloads and hybrid (onprem, burst to cloud) workloads
|
||||||
|
@ -5,41 +5,41 @@ tags:
|
|||||||
- keynote
|
- keynote
|
||||||
---
|
---
|
||||||
|
|
||||||
Nikhita presented projects that merge CloudNative and AI.
|
Nikhita presented projects that merge cloud native and AI.
|
||||||
PAtrick Ohly Joined for DRA
|
Patrick Ohly Joined for DRA
|
||||||
|
|
||||||
### The "news"
|
### The "news"
|
||||||
|
|
||||||
* New work group AI
|
* New work group AI
|
||||||
* More tools are including ai features
|
* More tools are including AI features
|
||||||
* New updated cncf for children feat AI
|
* New updated CNCF for children feat AI
|
||||||
* One decade of Kubernetes
|
* One decade of Kubernetes
|
||||||
* DRA is in alpha
|
* DRA is in alpha
|
||||||
|
|
||||||
### DRA
|
### DRA
|
||||||
|
|
||||||
* A new API for resources (node-local and node-attached)
|
* A new API for resources (node-local and node-attached)
|
||||||
* Sharing of ressources between cods and containers
|
* Sharing of resources between cods and containers
|
||||||
* Vendor specific stuff are abstracted by a vendor driver controller
|
* Vendor specific stuff are abstracted by a vendor driver controller
|
||||||
* The kube scheduler can interact with the vendor parameters for scheduling and autoscaling
|
* The kube scheduler can interact with the vendor parameters for scheduling and autoscaling
|
||||||
|
|
||||||
### Cloudnative AI ecosystem
|
### Cloud native AI ecosystem
|
||||||
|
|
||||||
* Kube is the seed for the AI infra plant
|
* Kube is the seed for the AI infra plant
|
||||||
* Kubeflow users wanted AI registries
|
* Kubeflow users wanted AI registries
|
||||||
* LLM on the edge
|
* LLM on the edge
|
||||||
* Opentelemetry bring semandtics
|
* OpenTelemetry bring semantics
|
||||||
* All of these tools form a symbiosis between
|
* All of these tools form a symbiosis between
|
||||||
* Topics of discussions
|
* Topics of discussions
|
||||||
|
|
||||||
### The working group AI
|
### The working group AI
|
||||||
|
|
||||||
* It was formed in october 2023
|
* It was formed in October 2023
|
||||||
* They are working on the whitepaper (cloudnative and ai) wich was opublished on 19.03.2024
|
* They are working on the white paper (cloud native and AI) which was published on 19.03.2024
|
||||||
* The landscape "cloudnative and ai" is WIP and will be merged into the main CNCF landscape
|
* The landscape "cloud native and AI" is WIP and will be merged into the main CNCF landscape
|
||||||
* The future focus will be on security and cost efficiency (with a hint of sustainability)
|
* The future focus will be on security and cost efficiency (with a hint of sustainability)
|
||||||
|
|
||||||
### LFAI and CNCF
|
### LFAI and CNCF
|
||||||
|
|
||||||
* The direcor of the AI foundation talks abouzt ai and cloudnative
|
* The director of the AI foundation talks about AI and cloud native
|
||||||
* They are looking forward to more colaboraion
|
* They are looking forward to more collaboration
|
||||||
|
@ -14,7 +14,7 @@ The entire talk was very short, but it was a nice demo of init containers
|
|||||||
* Security is hard - distroless sounds like a nice helper
|
* Security is hard - distroless sounds like a nice helper
|
||||||
* Basic Challenge: Usability-Security Dilemma -> But more usability doesn't mean less secure, but more updating
|
* Basic Challenge: Usability-Security Dilemma -> But more usability doesn't mean less secure, but more updating
|
||||||
* Distro: Kernel + Software Packages + Package manager (optional) -> In Containers just without the kernel
|
* Distro: Kernel + Software Packages + Package manager (optional) -> In Containers just without the kernel
|
||||||
* Distroless: No package manager, no shell, no webcluent (curl/wget) - only minimal sofware bundels
|
* Distroless: No package manager, no shell, no web client (curl/wget) - only minimal software bundles
|
||||||
|
|
||||||
## Tools for distroless image creation
|
## Tools for distroless image creation
|
||||||
|
|
||||||
@ -29,13 +29,13 @@ The entire talk was very short, but it was a nice demo of init containers
|
|||||||
|
|
||||||
## Demo
|
## Demo
|
||||||
|
|
||||||
* A (rough) distroless postgres with alpine build step and scratch final step
|
* A (rough) distroless Postgres with alpine build step and scratch final step
|
||||||
* A basic pg:alpine container used for init with a shared data volume
|
* A basic pg:alpine container used for init with a shared data volume
|
||||||
* The init uses the pg admin user to initialize the pg server (you don't need the admin creds after this)
|
* The init uses the pg admin user to initialize the pg server (you don't need the admin credentials after this)
|
||||||
|
|
||||||
### Kube
|
### Kube
|
||||||
|
|
||||||
* K apply failed b/c no internet, but was fixed by connecting to wifi
|
* K apply failed b/c no internet, but was fixed by connecting to Wi-Fi
|
||||||
* Without the init container the pod just crashes, with the init container the correct config gets created
|
* Without the init container the pod just crashes, with the init container the correct config gets created
|
||||||
|
|
||||||
### Docker compose
|
### Docker compose
|
||||||
|
@ -13,63 +13,63 @@ A talk by elastic.
|
|||||||
|
|
||||||
## About elastic
|
## About elastic
|
||||||
|
|
||||||
* Elestic cloud as a managed service
|
* Elastic cloud as a managed service
|
||||||
* Deployed across AWS/GCP/Azure in over 50 regions
|
* Deployed across AWS/GCP/Azure in over 50 regions
|
||||||
* 600.000+ Containers
|
* 600000+ Containers
|
||||||
|
|
||||||
### Elastic and Kube
|
### Elastic and Kube
|
||||||
|
|
||||||
* They offer elastic obervability
|
* They offer elastic observability
|
||||||
* They offer the ECK operator for simplified deployments
|
* They offer the ECK operator for simplified deployments
|
||||||
|
|
||||||
## The baseline
|
## The baseline
|
||||||
|
|
||||||
* Goal: A large scale (1M+ containers resilient platform on k8s
|
* Goal: A large scale (1M+ containers) resilient platform on k8s
|
||||||
* Architecture
|
* Architecture
|
||||||
* Global Control: The control plane (api) for users with controllers
|
* Global Control: The control plane (API) for users with controllers
|
||||||
* Regional Apps: The "shitload" of kubernetes clusters where the actual customer services live
|
* Regional Apps: The "shitload" of Kubernetes clusters where the actual customer services live
|
||||||
|
|
||||||
## Scalability
|
## Scalability
|
||||||
|
|
||||||
* Challenge: How large can our cluster be, how many clusters do we need
|
* Challenge: How large can our cluster be, how many clusters do we need
|
||||||
* Problem: Only basic guidelines exist for that
|
* Problem: Only basic guidelines exist for that
|
||||||
* Decision: Horizontaly scale the number of clusters (5ßß-1K nodes each)
|
* Decision: Horizontally scale the number of clusters (5ßß-1K nodes each)
|
||||||
* Decision: Disposable clusters
|
* Decision: Disposable clusters
|
||||||
* Throw away without data loss
|
* Throw away without data loss
|
||||||
* Single source of throuth is not cluster etcd but external -> No etcd backups needed
|
* Single source of truth is not cluster etcd but external -> No etcd backups needed
|
||||||
* Everything can be recreated any time
|
* Everything can be recreated any time
|
||||||
|
|
||||||
## Controllers
|
## Controllers
|
||||||
|
|
||||||
{{% notice style="note" %}}
|
{{% notice style="note" %}}
|
||||||
I won't copy the explanations of operators/controllers in this notes
|
I won't copy the explanations of operators/controllers in these notes
|
||||||
{{% /notice %}}
|
{{% /notice %}}
|
||||||
|
|
||||||
* Many different controllers, including (but not limited to)
|
* Many controllers, including (but not limited to)
|
||||||
* cluster controler: Register cluster to controller
|
* cluster controller: Register cluster to controller
|
||||||
* Project controller: Schedule user's project to cluster
|
* Project controller: Schedule user's project to cluster
|
||||||
* Product controllers (Elasticsearch, Kibana, etc.)
|
* Product controllers (Elasticsearch, Kibana, etc.)
|
||||||
* Ingress/Certmanager
|
* Ingress/Cert manager
|
||||||
* Sometimes controllers depend on controllers -> potential complexity
|
* Sometimes controllers depend on controllers -> potential complexity
|
||||||
* Pro:
|
* Pro:
|
||||||
* Resilient (Selfhealing)
|
* Resilient (Self-healing)
|
||||||
* Level triggered (desired state vs procedure triggered)
|
* Level triggered (desired state vs procedure triggered)
|
||||||
* Simple reasoning when comparing desired state vs state machine
|
* Simple reasoning when comparing desired state vs state machine
|
||||||
* Official controller runtime lib
|
* Official controller runtime lib
|
||||||
* Workque: Automatic Dedup, Retry backoff and so on
|
* Workqueue: Automatic Dedup, Retry back off and so on
|
||||||
|
|
||||||
## Global Controllers
|
## Global Controllers
|
||||||
|
|
||||||
* Basic operation
|
* Basic operation
|
||||||
* Uses project config from Elastic cloud as the desired state
|
* Uses project config from Elastic cloud as the desired state
|
||||||
* The actual state is a k9s ressource in another cluster
|
* The actual state is a k9s resource in another cluster
|
||||||
* Challenge: Where is the source of thruth if the data is not stored in etc
|
* Challenge: Where is the source of truth if the data is not stored in etcd
|
||||||
* Solution: External datastore (postgres)
|
* Solution: External data store (Postgres)
|
||||||
* Challenge: How do we sync the db sources to kubernetes
|
* Challenge: How do we sync the db sources to Kubernetes
|
||||||
* Potential solutions: Replace etcd with the external db
|
* Potential solutions: Replace etcd with the external db
|
||||||
* Chosen solution:
|
* Chosen solution:
|
||||||
* The controllers don't use CRDs for storage, but they expose a webapi
|
* The controllers don't use CRDs for storage, but they expose a web-API
|
||||||
* Reconciliation still now interacts with the external db and go channels (que) instead
|
* Reconciliation still now interacts with the external db and go channels (queue) instead
|
||||||
* Then the CRs for the operators get created by the global controller
|
* Then the CRs for the operators get created by the global controller
|
||||||
|
|
||||||
### Large scale
|
### Large scale
|
||||||
@ -82,10 +82,10 @@ I won't copy the explanations of operators/controllers in this notes
|
|||||||
### Reconcile
|
### Reconcile
|
||||||
|
|
||||||
* User-driven events are processed asap
|
* User-driven events are processed asap
|
||||||
* reconcole of everything should happen, bus with low prio slowly in the background
|
* reconcile of everything should happen, bus with low priority slowly in the background
|
||||||
* Solution: Status: LastReconciledRevision (timestamp) get's compare to revision, if larger -> User change
|
* Solution: Status: LastReconciledRevision (timestamp) gets compare to revision, if larger -> User change
|
||||||
* Prioritization: Just a custom event handler with the normal queue and a low prio
|
* Prioritization: Just a custom event handler with the normal queue and a low priority
|
||||||
* Low Prio Queue: Just a queue that adds items to the normal work-queue with a rate limit
|
* Queue: Just a queue that adds items to the normal work-queue with a rate limit
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
flowchart LR
|
flowchart LR
|
||||||
|
@ -6,39 +6,39 @@ tags:
|
|||||||
- security
|
- security
|
||||||
---
|
---
|
||||||
|
|
||||||
A talk by Google and Microsoft with the premise of bether auth in k8s.
|
A talk by Google and Microsoft with the premise of better auth in k8s.
|
||||||
|
|
||||||
## Baselines
|
## Baselines
|
||||||
|
|
||||||
* Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets
|
* Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets
|
||||||
* Result: CVEs
|
* Result: CVEs
|
||||||
* Example: Just use ingress, nginx, put in some lua code in the config and voila: Service account token
|
* Example: Just use ingress, nginx, put in some Lua code in the config and e voilà: Service account token
|
||||||
* Fix: No more fun
|
* Fix: No more fun
|
||||||
|
|
||||||
## Basic solutions
|
## Basic solutions
|
||||||
|
|
||||||
* Seperate Control (the controller) from data (the ingress)
|
* Separate Control (the controller) from data (the ingress)
|
||||||
* Namespace limited ingress
|
* Namespace limited ingress
|
||||||
|
|
||||||
## Current state of cross namespace stuff
|
## Current state of cross namespace stuff
|
||||||
|
|
||||||
* Why: Reference tls cert for gateway api in the cert team'snamespace
|
* Why: Reference TLS cert for gateway API in the cert team's namespace
|
||||||
* Why: Move all ingress configs to one namespace
|
* Why: Move all ingress configs to one namespace
|
||||||
* Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret)
|
* Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret)
|
||||||
* Gateway Solution:
|
* Gateway Solution:
|
||||||
* Gateway TLS secret ref includes a namespace
|
* Gateway TLS secret ref includes a namespace
|
||||||
* ReferenceGrant pretty mutch allows referencing from X (Gatway) to Y (Secret)
|
* ReferenceGrant pretty much allows referencing from X (Gateway) to Y (Secret)
|
||||||
* Limits:
|
* Limits:
|
||||||
* Has to be implemented via controllers
|
* Has to be implemented via controllers
|
||||||
* The controllers still have readall - they just check if they are supposed to do this
|
* The controllers still have read all - they just check if they are supposed to do this
|
||||||
|
|
||||||
## Goals
|
## Goals
|
||||||
|
|
||||||
### Global
|
### Global
|
||||||
|
|
||||||
* Grant access to controller to only ressources relevant for them (using references and maybe class segmentation)
|
* Grant access to controller to only resources relevant for them (using references and maybe class segmentation)
|
||||||
* Allow for safe cross namespace references
|
* Allow for safe cross namespace references
|
||||||
* Make it easy for api devs to adopt it
|
* Make it easy for API devs to adopt it
|
||||||
|
|
||||||
### Personas
|
### Personas
|
||||||
|
|
||||||
@ -50,20 +50,20 @@ A talk by Google and Microsoft with the premise of bether auth in k8s.
|
|||||||
|
|
||||||
* Alex: Define relationships via ReferencePatterns
|
* Alex: Define relationships via ReferencePatterns
|
||||||
* Kai: Specify controller identity (Serviceaccount), define relationship API
|
* Kai: Specify controller identity (Serviceaccount), define relationship API
|
||||||
* Rohan: Define cross namespace references (aka ressource grants that allow access to their ressources)
|
* Rohan: Define cross namespace references (aka resource grants that allow access to their resources)
|
||||||
|
|
||||||
## Result of the paper
|
## Result of the paper
|
||||||
|
|
||||||
### Architecture
|
### Architecture
|
||||||
|
|
||||||
* ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API
|
* ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API
|
||||||
* ReferenceConsumer: Who (IOdentity) has access under which conditions?
|
* ReferenceConsumer: Who (Identity) has access under which conditions?
|
||||||
* ReferenceGrant: Allow specific references
|
* ReferenceGrant: Allow specific references
|
||||||
|
|
||||||
### POC
|
### POC
|
||||||
|
|
||||||
* Minimum access: You only get access if the grant is there AND the reference actually exists
|
* Minimum access: You only get access if the grant is there AND the reference actually exists
|
||||||
* Their basic implementation works with the kube api
|
* Their basic implementation works with the kube API
|
||||||
|
|
||||||
### Open questions
|
### Open questions
|
||||||
|
|
||||||
@ -74,9 +74,9 @@ A talk by Google and Microsoft with the premise of bether auth in k8s.
|
|||||||
|
|
||||||
## Alternative
|
## Alternative
|
||||||
|
|
||||||
* Idea: Just extend RBAC Roles with a selector (match labels, etc)
|
* Idea: Just extend RBAC Roles with a selector (match labels, etc.)
|
||||||
* Problems:
|
* Problems:
|
||||||
* Requires changes to kubernetes core auth
|
* Requires changes to Kubernetes core auth
|
||||||
* Everything bus list and watch is a pain
|
* Everything bus list and watch is a pain
|
||||||
* How do you handle AND vs OR selection
|
* How do you handle AND vs OR selection
|
||||||
* Field selectors: They exist
|
* Field selectors: They exist
|
||||||
@ -84,5 +84,5 @@ A talk by Google and Microsoft with the premise of bether auth in k8s.
|
|||||||
|
|
||||||
## Meanwhile
|
## Meanwhile
|
||||||
|
|
||||||
* Prefer tools that support isolatiobn between controller and dataplane
|
* Prefer tools that support isolation between controller and data-plane
|
||||||
* Disable all non-needed features -> Especially scripting
|
* Disable all non-needed features -> Especially scripting
|
||||||
|
@ -6,32 +6,32 @@ tags:
|
|||||||
- dx
|
- dx
|
||||||
---
|
---
|
||||||
|
|
||||||
A talk by UX and software people at RedHat (Podman team).
|
A talk by UX and software people at Red Hat (Podman team).
|
||||||
The talk mainly followed the academic study process (aka this is the survey I did for my bachelors/masters thesis).
|
The talk mainly followed the academic study process (aka this is the survey I did for my bachelor's/master's thesis).
|
||||||
|
|
||||||
## Research
|
## Research
|
||||||
|
|
||||||
* User research Study including 11 devs and platform engineers over three months
|
* User research Study including 11 devs and platform engineers over three months
|
||||||
* Focus was on an new podman desktop feature
|
* Focus was on a new Podman desktop feature
|
||||||
* Experence range 2-3 years experience average (low no experience, high oldschool kube)
|
* Experience range 2-3 years experience average (low no experience, high old school kube)
|
||||||
* 16 questions regarding environment, workflow, debugging and pain points
|
* 16 questions regarding environment, workflow, debugging and pain points
|
||||||
* Analysis: Affinity mapping
|
* Analysis: Affinity mapping
|
||||||
|
|
||||||
## Findings
|
## Findings
|
||||||
|
|
||||||
* Where do I start when things are broken? -> There may be solutions, but devs don't know about them
|
* Where do I start when things are broken? -> There may be solutions, but devs don't know about them
|
||||||
* Network debugging is hard b/c many layers and problems occuring in between cni and infra are really hard -> Network topology issues are rare but hard
|
* Network debugging is hard b/c many layers and problems occurring in between CNI and infra are really hard -> Network topology issues are rare but hard
|
||||||
* YAML identation -> Tool support is needed for visualisation
|
* YAML indentation -> Tool support is needed for visualization
|
||||||
* YAML validation -> Just use validation in dev and gitops
|
* YAML validation -> Just use validation in dev and GitOps
|
||||||
* YAML Cleanup -> Normalize YAML (order, anchors, etc) for easy diff
|
* YAML Cleanup -> Normalize YAML (order, anchors, etc.) for easy diff
|
||||||
* Inadequate security analysis (too verbose, non-issues are warnings) -> Realtime insights (and during dev)
|
* Inadequate security analysis (too verbose, non-issues are warnings) -> Real-time insights (and during dev)
|
||||||
* Crash Loop -> Identify stuck containers, simple debug containers
|
* Crash Loop -> Identify stuck containers, simple debug containers
|
||||||
* CLI vs GUI -> Enable eperience level oriented gui, Enhance intime troubleshooting
|
* CLI vs GUI -> Enable experience level oriented GUI, Enhance in-time troubleshooting
|
||||||
|
|
||||||
## General issues
|
## General issues
|
||||||
|
|
||||||
* No direct fs access
|
* No direct fs access
|
||||||
* Multiple kubeconfigs
|
* Multiple kubeconfigs
|
||||||
* SaaS is sometimes only provided on kube, which sounds like complexity
|
* SaaS is sometimes only provided on kube, which sounds like complexity
|
||||||
* Where do i begin my troubleshooting
|
* Where do I begin my troubleshooting
|
||||||
* Interoperability/Fragility with updates
|
* Interoperability/Fragility with updates
|
||||||
|
@ -6,11 +6,11 @@ tags:
|
|||||||
- network
|
- network
|
||||||
---
|
---
|
||||||
|
|
||||||
Global field CTO at Solo.io with a hint of servicemesh background.
|
Global field CTO at Solo.io with a hint of service mesh background.
|
||||||
|
|
||||||
## History
|
## History
|
||||||
|
|
||||||
* LinkerD 1.X was the first moder servicemesh and basicly a opt-in serviceproxy
|
* LinkerD 1.X was the first modern service mesh and basically an opt-in service proxy
|
||||||
* Challenges: JVM (size), latencies, ...
|
* Challenges: JVM (size), latencies, ...
|
||||||
|
|
||||||
### Why not node-proxy?
|
### Why not node-proxy?
|
||||||
@ -23,8 +23,8 @@ Global field CTO at Solo.io with a hint of servicemesh background.
|
|||||||
### Why sidecar?
|
### Why sidecar?
|
||||||
|
|
||||||
* Transparent (ish)
|
* Transparent (ish)
|
||||||
* PArt of app lifecycle (up/down)
|
* Part of app lifecycle (up/down)
|
||||||
* Single tennant
|
* Single tenant
|
||||||
* No noisy neighbor
|
* No noisy neighbor
|
||||||
|
|
||||||
### Sidecar drawbacks
|
### Sidecar drawbacks
|
||||||
@ -46,7 +46,7 @@ Global field CTO at Solo.io with a hint of servicemesh background.
|
|||||||
|
|
||||||
* Full transparency
|
* Full transparency
|
||||||
* Optimized networking
|
* Optimized networking
|
||||||
* Lower ressource allocation
|
* Lower resource allocation
|
||||||
* No race conditions
|
* No race conditions
|
||||||
* No manual pod injection
|
* No manual pod injection
|
||||||
* No credentials in the app
|
* No credentials in the app
|
||||||
@ -68,12 +68,12 @@ Global field CTO at Solo.io with a hint of servicemesh background.
|
|||||||
* Kubeproxy replacement
|
* Kubeproxy replacement
|
||||||
* Ingress (via Gateway API)
|
* Ingress (via Gateway API)
|
||||||
* Mutual Authentication
|
* Mutual Authentication
|
||||||
* Specialiced CiliumNetworkPolicy
|
* Specialized CiliumNetworkPolicy
|
||||||
* Configure Envoy throgh Cilium
|
* Configure Envoy through Cilium
|
||||||
|
|
||||||
### Control Plane
|
### Control Plane
|
||||||
|
|
||||||
* Cilium-Agent on each node that reacts to scheduled workloads by programming the local dataplane
|
* Cilium-Agent on each node that reacts to scheduled workloads by programming the local data-plane
|
||||||
* API via Gateway API and CiliumNetworkPolicy
|
* API via Gateway API and CiliumNetworkPolicy
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
@ -98,29 +98,29 @@ flowchart TD
|
|||||||
### Data plane
|
### Data plane
|
||||||
|
|
||||||
* Configured by control plane
|
* Configured by control plane
|
||||||
* Does all of the eBPF things in L4
|
* Does all the eBPF things in L4
|
||||||
* Does all of the envoy things in L7
|
* Does all the envoy things in L7
|
||||||
* In-Kernel Wireguard for optional transparent encryption
|
* In-Kernel WireGuard for optional transparent encryption
|
||||||
|
|
||||||
### mTLS
|
### mTLS
|
||||||
|
|
||||||
* Network Policies get applied at the eBPF layer (check if id a can talk to id 2)
|
* Network Policies get applied at the eBPF layer (check if ID a can talk to ID 2)
|
||||||
* When mTLS is enabled there is a auth check in advance -> It it fails, proceed with agents
|
* When mTLS is enabled there is an auth check in advance -> If it fails, proceed with agents
|
||||||
* Agents talk to each other for mTLS Auth and save the result to a cache -> Now ebpf can say yes
|
* Talk to each other for mTLS Auth and save the result to a cache -> Now eBPF can say yes
|
||||||
* Problems: The caches can lead to id confusion
|
* Problems: The caches can lead to ID confusion
|
||||||
|
|
||||||
## Istio
|
## Istio
|
||||||
|
|
||||||
### Basiscs
|
### Basics
|
||||||
|
|
||||||
* L4/7 Service mesh without it's own CNI
|
* L4/7 Service mesh without its own CNI
|
||||||
* Based on envoy
|
* Based on envoy
|
||||||
* mTLS
|
* mTLS
|
||||||
* Classicly via sidecar, nowadays
|
* Classically via sidecar, nowadays
|
||||||
|
|
||||||
### Ambient mode
|
### Ambient mode
|
||||||
|
|
||||||
* Seperate L4 and L7 -> Can run on cilium
|
* Separate L4 and L7 -> Can run on cilium
|
||||||
* mTLS
|
* mTLS
|
||||||
* Gateway API
|
* Gateway API
|
||||||
|
|
||||||
@ -143,14 +143,14 @@ flowchart TD
|
|||||||
```
|
```
|
||||||
|
|
||||||
* Central xDS Control Plane
|
* Central xDS Control Plane
|
||||||
* Per-Node Dataplane that reads updates from Control Plane
|
* Per-Node Data-plane that reads updates from Control Plane
|
||||||
|
|
||||||
### Data Plane
|
### Data Plane
|
||||||
|
|
||||||
* L4 runs via zTunnel Daemonset that handels mTLS
|
* L4 runs via zTunnel Daemonset that handles mTLS
|
||||||
* The zTunnel traffic get's handed over to the CNI
|
* The zTunnel traffic gets handed over to the CNI
|
||||||
* L7 Proxy lives somewhere™ and traffic get's routed through it as an "extra hop" aka waypoint
|
* L7 Proxy lives somewhere™ and traffic gets routed through it as an "extra hop" aka waypoint
|
||||||
|
|
||||||
### mTLS
|
### mTLS
|
||||||
|
|
||||||
* The zTunnel creates a HBONE (http overlay network) tunnel with mTLS
|
* The zTunnel creates a HBONE (HTTP overlay network) tunnel with mTLS
|
||||||
|
@ -8,17 +8,17 @@ Who have I talked to today, are there any follow-ups or learnings?
|
|||||||
## Operator Framework
|
## Operator Framework
|
||||||
|
|
||||||
* We talked about the operator lifecycle manager
|
* We talked about the operator lifecycle manager
|
||||||
* They shared the roadmap and the new release 1.0 will bring support for Operator Bundle loading from any oci source (no more public-registry enforcement)
|
* They shared the roadmap and the new release 1.0 will bring support for Operator Bundle loading from any OCI source (no more public-registry enforcement)
|
||||||
|
|
||||||
## Flux
|
## Flux
|
||||||
|
|
||||||
* We talked about automatic helm release updates [lessons learned from flux](/lessons_learned/02_flux)
|
* We talked about automatic helm release updates [lessons learned from flux](/lessons_learned/02_flux)
|
||||||
|
|
||||||
## Cloudfoundry/Paketo
|
## Cloud foundry/Paketo
|
||||||
|
|
||||||
* We mostly had some smalltalk
|
* We mostly had some smalltalk
|
||||||
* There will be a cloudfoundry day in Karlsruhe in October, they'd be happy to have us ther
|
* There will be a cloud foundry day in Karlsruhe in October, they'd be happy to have us there
|
||||||
* The whole KORFI (Cloudfoundry on Kubernetes) Project is still going strong, but no release canidate yet (or in the near future)
|
* The whole KORFI (Cloud foundry on Kubernetes) Project is still going strong, but no release candidate yet (or in the near future)
|
||||||
|
|
||||||
## Traefik
|
## Traefik
|
||||||
|
|
||||||
@ -31,7 +31,7 @@ They will follow up
|
|||||||
## Postman
|
## Postman
|
||||||
|
|
||||||
* I asked them about their new cloud-only stuff: They will keep their direction
|
* I asked them about their new cloud-only stuff: They will keep their direction
|
||||||
* The are also planning to work on info materials on why postman SaaS is not a big security risk
|
* They are also planning to work on info materials on why postman SaaS is not a big security risk
|
||||||
|
|
||||||
## Mattermost
|
## Mattermost
|
||||||
|
|
||||||
@ -39,9 +39,9 @@ They will follow up
|
|||||||
I should follow up
|
I should follow up
|
||||||
{{% /notice %}}
|
{{% /notice %}}
|
||||||
|
|
||||||
* I talked about our problems with the mattermost operator and was asked to get back to them with the errors
|
* I talked about our problems with the Mattermost operator and was asked to get back to them with the errors
|
||||||
* They're currently migrating the mattermost cloud offering to arm - therefor arm support will be coming in the next months
|
* They're currently migrating the Mattermost cloud offering to arm - therefor arm support will be coming in the next months
|
||||||
* The mattermost guy had exactly the same problems with notifications and read/unread using element
|
* The Mattermost guy had exactly the same problems with notifications and read/unread using element
|
||||||
|
|
||||||
## Vercel
|
## Vercel
|
||||||
|
|
||||||
@ -53,7 +53,7 @@ I should follow up
|
|||||||
* The paid renovate offering now includes build failure estimation
|
* The paid renovate offering now includes build failure estimation
|
||||||
* I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification
|
* I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification
|
||||||
|
|
||||||
### Certmanager
|
### Cert manager
|
||||||
|
|
||||||
* The best swag (judged by coolness points)
|
* The best swag (judged by coolness points)
|
||||||
|
|
||||||
@ -63,11 +63,11 @@ I should follow up
|
|||||||
They will follow up with a quick demo
|
They will follow up with a quick demo
|
||||||
{{% /notice %}}
|
{{% /notice %}}
|
||||||
|
|
||||||
* A kubernetes security/runtime security solution with pretty nice looking urgency filters
|
* A Kubernetes security/runtime security solution with pretty nice looking urgency filters
|
||||||
* Includes eBPF to see what code actually runs
|
* Includes eBPF to see what code actually runs
|
||||||
* I'll witness a demo in early/mid april
|
* I'll witness a demo in early/mid April
|
||||||
|
|
||||||
### Isovalent
|
### Isovalent
|
||||||
|
|
||||||
* Dinner (very tasty)
|
* Dinner (very tasty)
|
||||||
* Cilium still sounds like the way to go in regards to CNIs
|
* Cilium still sounds like the way to go in regard to CNIs
|
||||||
|
@ -5,7 +5,7 @@ weight: 2
|
|||||||
---
|
---
|
||||||
|
|
||||||
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
|
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
|
||||||
This is where all of the people joined (over 12000)
|
This is where all the people joined (over 12000)
|
||||||
|
|
||||||
The opening keynotes were a mix of talks and panel discussions.
|
The opening keynotes were a mix of talks and panel discussions.
|
||||||
The main topic was - who could have guessed - AI and ML.
|
The main topic was - who could have guessed - AI and ML.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user