day2 the next episode
This commit is contained in:
		@@ -1,5 +1,5 @@
 | 
				
			|||||||
---
 | 
					---
 | 
				
			||||||
title: Sponsored: Build an open source platform for ai/ml
 | 
					title: "Sponsored: Build an open source platform for ai/ml"
 | 
				
			||||||
weight: 4
 | 
					weight: 4
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 
 | 
				
			|||||||
@@ -1,6 +1,6 @@
 | 
				
			|||||||
---
 | 
					---
 | 
				
			||||||
title: Is your image really distroless?
 | 
					title: Is your image really distroless?
 | 
				
			||||||
weight:7
 | 
					weight: 7
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Laurent Goderre from Docker.
 | 
					Laurent Goderre from Docker.
 | 
				
			||||||
 
 | 
				
			|||||||
							
								
								
									
										98
									
								
								content/day2/08_multicloud_saas.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										98
									
								
								content/day2/08_multicloud_saas.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,98 @@
 | 
				
			|||||||
 | 
					---
 | 
				
			||||||
 | 
					title: Building a large scale multi-cloud multi-region SaaS platform with kubernetes controllers
 | 
				
			||||||
 | 
					weight: 8
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					> Interchangeable wording in this talk: controller == operator
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A talk by elastic.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## About elastic
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Elestic cloud as a managed service
 | 
				
			||||||
 | 
					* Deployed across AWS/GCP/Azure in over 50 regions
 | 
				
			||||||
 | 
					* 600.000+ Containers
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Elastic and Kube
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* They offer elastic obervability
 | 
				
			||||||
 | 
					* They offer the ECK operator for simplified deployments
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## The baseline
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Goal: A large scale (1M+ containers resilient platform on k8s
 | 
				
			||||||
 | 
					* Architecture
 | 
				
			||||||
 | 
					  * Global Control: The control plane (api) for users with controllers
 | 
				
			||||||
 | 
					  * Regional Apps: The "shitload" of kubernetes clusters where the actual customer services live
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Scalability
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Challenge: How large can our cluster be, how many clusters do we need
 | 
				
			||||||
 | 
					* Problem: Only basic guidelines exist for that
 | 
				
			||||||
 | 
					* Decision: Horizontaly scale the number of clusters (5ßß-1K nodes each)
 | 
				
			||||||
 | 
					* Decision: Disposable clusters
 | 
				
			||||||
 | 
					  * Throw away without data loss
 | 
				
			||||||
 | 
					  * Single source of throuth is not cluster etcd but external -> No etcd backups needed
 | 
				
			||||||
 | 
					  * Everything can be recreated any time
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Controllers
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					{{% notice style="note" %}}
 | 
				
			||||||
 | 
					I won't copy the explanations of operators/controllers in this notes
 | 
				
			||||||
 | 
					{{% /notice %}}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Many different controllers, including (but not limited to)
 | 
				
			||||||
 | 
					  * cluster controler: Register cluster to controller
 | 
				
			||||||
 | 
					  * Project controller: Schedule user's project to cluster
 | 
				
			||||||
 | 
					  * Product controllers (Elasticsearch, Kibana, etc.)
 | 
				
			||||||
 | 
					  * Ingress/Certmanager
 | 
				
			||||||
 | 
					* Sometimes controllers depend on controllers -> potential complexity
 | 
				
			||||||
 | 
					* Pro:
 | 
				
			||||||
 | 
					  * Resilient (Selfhealing)
 | 
				
			||||||
 | 
					  * Level triggered (desired state vs procedure triggered)
 | 
				
			||||||
 | 
					  * Simple reasoning when comparing desired state vs state machine
 | 
				
			||||||
 | 
					  * Official controller runtime lib
 | 
				
			||||||
 | 
					* Workque: Automatic Dedup, Retry backoff and so on
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Global Controllers
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Basic operation
 | 
				
			||||||
 | 
					  * Uses project config from Elastic cloud as the desired state
 | 
				
			||||||
 | 
					  * The actual state is a k9s ressource in another cluster
 | 
				
			||||||
 | 
					* Challenge: Where is the source of thruth if the data is not stored in etc
 | 
				
			||||||
 | 
					* Solution: External datastore (postgres)
 | 
				
			||||||
 | 
					* Challenge: How do we sync the db sources to kubernetes
 | 
				
			||||||
 | 
					* Potential solutions: Replace etcd with the external db
 | 
				
			||||||
 | 
					* Chosen solution:
 | 
				
			||||||
 | 
					  * The controllers don't use CRDs for storage, but they expose a webapi
 | 
				
			||||||
 | 
					  * Reconciliation still now interacts with the external db and go channels (que) instead 
 | 
				
			||||||
 | 
					  * Then the CRs for the operators get created by the global controller
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Large scale
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Problem: Reconcile gets triggered for all objects on restart -> Make sure nothing gets missed and is used with the latest controller version
 | 
				
			||||||
 | 
					* Idea: Just create more workers for 100K+ Objects
 | 
				
			||||||
 | 
					* Problem: CPU go brrr and db gets overloaded
 | 
				
			||||||
 | 
					* Problem: If you create an item during restart, suddenly it is at the end of a 100Kü item work-queue
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Reconcile
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* User-driven events are processed asap
 | 
				
			||||||
 | 
					* reconcole of everything should happen, bus with low prio slowly in the background
 | 
				
			||||||
 | 
					* Solution: Status: LastReconciledRevision (timestamp) get's compare to revision, if larger -> User change
 | 
				
			||||||
 | 
					* Prioritization: Just a custom event handler with the normal queue and a low prio
 | 
				
			||||||
 | 
					* Low Prio Queue: Just a queue that adds items to the normal work-queue with a rate limit
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```mermaid
 | 
				
			||||||
 | 
					flowchart LR
 | 
				
			||||||
 | 
					    low-->rl(ratelimit)
 | 
				
			||||||
 | 
					    rl-->wq(work queue)
 | 
				
			||||||
 | 
					    wq-->controller
 | 
				
			||||||
 | 
					    high-->wq
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Related
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Argo for CI/CD
 | 
				
			||||||
 | 
					* Crossplane for cluster autoprovision
 | 
				
			||||||
							
								
								
									
										85
									
								
								content/day2/09_safety_usability_auth.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										85
									
								
								content/day2/09_safety_usability_auth.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,85 @@
 | 
				
			|||||||
 | 
					---
 | 
				
			||||||
 | 
					title: "Safety or usability: Why not both? Towards referential auth in k8s"
 | 
				
			||||||
 | 
					weight: 9
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A talk by Google and Microsoft with the premise of bether auth in k8s.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Baselines
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets
 | 
				
			||||||
 | 
					* Result: CVEs
 | 
				
			||||||
 | 
					* Example: Just use ingress, nginx, put in some lua code in the config and voila: Service account token
 | 
				
			||||||
 | 
					* Fix: No more fun
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Basic solutions
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Seperate Control (the controller) from data (the ingress)
 | 
				
			||||||
 | 
					* Namespace limited ingress
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Current state of cross namespace stuff
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Why: Reference tls cert for gateway api in the cert team'snamespace
 | 
				
			||||||
 | 
					* Why: Move all ingress configs to one namespace
 | 
				
			||||||
 | 
					* Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret)
 | 
				
			||||||
 | 
					* Gateway Solution:
 | 
				
			||||||
 | 
					  * Gateway TLS secret ref includes a namespace
 | 
				
			||||||
 | 
					  * ReferenceGrant pretty mutch allows referencing from X (Gatway) to Y (Secret)
 | 
				
			||||||
 | 
					* Limits: 
 | 
				
			||||||
 | 
					  * Has to be implemented via controllers
 | 
				
			||||||
 | 
					  * The controllers still have readall - they just check if they are supposed to do this
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Goals
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Global
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Grant access to controller to only ressources relevant for them (using references and maybe class segmentation)
 | 
				
			||||||
 | 
					* Allow for safe cross namespace references
 | 
				
			||||||
 | 
					* Make it easy for api devs to adopt it
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Personas
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Alex API author
 | 
				
			||||||
 | 
					* Kai controller author
 | 
				
			||||||
 | 
					* Rohan Resource owner
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### What our stakeholders want
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Alex: Define relationships via ReferencePatterns
 | 
				
			||||||
 | 
					* Kai: Specify controller identity (Serviceaccount), define relationship API
 | 
				
			||||||
 | 
					* Rohan: Define cross namespace references (aka ressource grants that allow access to their ressources)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Result of the paper
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Architecture
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API
 | 
				
			||||||
 | 
					* ReferenceConsumer: Who (IOdentity) has access under which conditions?
 | 
				
			||||||
 | 
					* ReferenceGrant: Allow specific references
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### POC
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Minimum access: You only get access if the grant is there AND the reference actually exists
 | 
				
			||||||
 | 
					* Their basic implementation works with the kube api
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Open questions
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Naming
 | 
				
			||||||
 | 
					* Make people adopt this
 | 
				
			||||||
 | 
					* What about namespace-scoped ReferenceConsumer
 | 
				
			||||||
 | 
					* Is there a need of RBAC verb support (not only read access)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Alternative
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Idea: Just extend RBAC Roles with a selector (match labels, etc)
 | 
				
			||||||
 | 
					* Problems:
 | 
				
			||||||
 | 
					  * Requires changes to kubernetes core auth
 | 
				
			||||||
 | 
					  * Everything bus list and watch is a pain
 | 
				
			||||||
 | 
					  * How do you handle AND vs OR selection
 | 
				
			||||||
 | 
					  * Field selectors: They exist
 | 
				
			||||||
 | 
					* Benefits: Simple controller implementation
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Meanwhile
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Prefer tools that support isolatiobn between controller and dataplane
 | 
				
			||||||
 | 
					* Disable all non-needed features -> Especially scripting
 | 
				
			||||||
							
								
								
									
										34
									
								
								content/day2/10_dev_ux.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										34
									
								
								content/day2/10_dev_ux.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,34 @@
 | 
				
			|||||||
 | 
					---
 | 
				
			||||||
 | 
					title: Developers Demand UX for K8s!
 | 
				
			||||||
 | 
					weight: 10
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A talk by UX and software people at RedHat (Podman team).
 | 
				
			||||||
 | 
					The talk mainly followed the academic study process (aka this is the survey I did for my bachelors/masters thesis).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Research
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* User research Study including 11 devs and platform engineers over three months
 | 
				
			||||||
 | 
					* Focus was on an new podman desktop feature
 | 
				
			||||||
 | 
					* Experence range 2-3 years experience average (low no experience, high oldschool kube)
 | 
				
			||||||
 | 
					* 16 questions regarding environment, workflow, debugging and pain points
 | 
				
			||||||
 | 
					* Analysis: Affinity mapping
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Findings
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Where do I start when things are broken? -> There may be solutions, but devs don't know about them
 | 
				
			||||||
 | 
					* Network debugging is hard b/c many layers and problems occuring in between cni and infra are really hard -> Network topology issues are rare but hard
 | 
				
			||||||
 | 
					* YAML identation -> Tool support is needed for visualisation
 | 
				
			||||||
 | 
					* YAML validation -> Just use validation in dev and gitops
 | 
				
			||||||
 | 
					* YAML Cleanup -> Normalize YAML (order, anchors, etc) for easy diff
 | 
				
			||||||
 | 
					* Inadequate security analysis (too verbose, non-issues are warnings) -> Realtime insights (and during dev)
 | 
				
			||||||
 | 
					* Crash Loop -> Identify stuck containers, simple debug containers
 | 
				
			||||||
 | 
					* CLI vs GUI -> Enable eperience level oriented gui, Enhance intime troubleshooting
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## General issues
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* No direct fs access
 | 
				
			||||||
 | 
					* Multiple kubeconfigs
 | 
				
			||||||
 | 
					* SaaS is sometimes only provided on kube, which sounds like complexity
 | 
				
			||||||
 | 
					* Where do i begin my troubleshooting
 | 
				
			||||||
 | 
					* Interoperability/Fragility with updates
 | 
				
			||||||
@@ -26,4 +26,29 @@ Who have I talked to today, are there any follow-ups or learnings?
 | 
				
			|||||||
They will follow up
 | 
					They will follow up
 | 
				
			||||||
{{% /notice %}}
 | 
					{{% /notice %}}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* We mostly talked about traefik hub as an API-portal
 | 
					* We mostly talked about traefik hub as an API-portal
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Postman
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* I asked them about their new cloud-only stuff: They will keep their direction
 | 
				
			||||||
 | 
					* The are also planning to work on info materials on why postman SaaS is not a big security risk
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Mattermost
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					{{% notice style="note" %}}
 | 
				
			||||||
 | 
					I should follow up
 | 
				
			||||||
 | 
					{{% /notice %}}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* I talked about our problems with the mattermost operator and was asked to get back to them with the errors
 | 
				
			||||||
 | 
					* They're currently migrating the mattermost cloud offering to arm - therefor arm support will be coming in the next months
 | 
				
			||||||
 | 
					* The mattermost guy had exactly the same problems with notifications and read/unread using element
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Vercel
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Nice guys, talked a bit about convincing customers to switch to the edge
 | 
				
			||||||
 | 
					* Also talked about policy validation
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Renovate
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* The paid renovate offering now includes build failure estimation
 | 
				
			||||||
 | 
					* I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification 
 | 
				
			||||||
 
 | 
				
			|||||||
@@ -1,6 +1,7 @@
 | 
				
			|||||||
---
 | 
					---
 | 
				
			||||||
archetype: chapter 
 | 
					archetype: chapter 
 | 
				
			||||||
title: Day 2
 | 
					title: Day 2
 | 
				
			||||||
 | 
					weight: 2
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
 | 
					Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
 | 
				
			||||||
 
 | 
				
			|||||||
@@ -5,3 +5,4 @@ title: Check this out
 | 
				
			|||||||
Just a loose list of stuff that souded interesting
 | 
					Just a loose list of stuff that souded interesting
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Dapr
 | 
					* Dapr
 | 
				
			||||||
 | 
					* etcd backups
 | 
				
			||||||
 
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user