Compare commits
5 Commits
b515be2220
...
f8e654d6a5
Author | SHA1 | Date |
---|---|---|
Nicolai Ort | f8e654d6a5 | |
Nicolai Ort | 9ee562e88d | |
Nicolai Ort | daf83861af | |
Nicolai Ort | 7b1203c7a3 | |
Nicolai Ort | e2e3b2fdf3 |
|
@ -0,0 +1,119 @@
|
|||
CloudNativeCon
|
||||
Syntasso
|
||||
OpenTelemetry
|
||||
Multitannancy
|
||||
Multitenancy
|
||||
PDBs
|
||||
Buildpacks
|
||||
buildpacks
|
||||
Konveyor
|
||||
GenAI
|
||||
Kube
|
||||
Kustomize
|
||||
KServe
|
||||
kube
|
||||
InferenceServices
|
||||
Replicafailure
|
||||
etcd
|
||||
RBAC
|
||||
CRDs
|
||||
CRs
|
||||
GitOps
|
||||
CnPG
|
||||
mTLS
|
||||
WAL
|
||||
AZs
|
||||
DBs
|
||||
kNative
|
||||
Kaniko
|
||||
Dupr
|
||||
crossplane
|
||||
DBaaS
|
||||
APPaaS
|
||||
CLUSTERaaS
|
||||
OpsManager
|
||||
multicluster
|
||||
Statefulset
|
||||
eBPF
|
||||
Parca
|
||||
KubeCon
|
||||
FinOps
|
||||
moondream
|
||||
OLLAMA
|
||||
LLVA
|
||||
LLAVA
|
||||
bokllava
|
||||
NVLink
|
||||
CUDA
|
||||
Space-seperated
|
||||
KAITO
|
||||
Hugginface
|
||||
LLMA
|
||||
Alluxio
|
||||
LLMs
|
||||
onprem
|
||||
Kube
|
||||
Kubeflow
|
||||
Ohly
|
||||
distroless
|
||||
init
|
||||
Distroless
|
||||
Buildkit
|
||||
busybox
|
||||
ECK
|
||||
Kibana
|
||||
Dedup
|
||||
Crossplane
|
||||
autoprovision
|
||||
RBAC
|
||||
Serviceaccount
|
||||
CVEs
|
||||
Podman
|
||||
LinkerD
|
||||
sidecarless
|
||||
Kubeproxy
|
||||
Daemonset
|
||||
zTunnel
|
||||
HBONE
|
||||
Paketo
|
||||
KORFI
|
||||
Traefik
|
||||
traefik
|
||||
Vercel
|
||||
Isovalent
|
||||
CNIs
|
||||
Ivanti
|
||||
envs
|
||||
CoreDNS
|
||||
Istio
|
||||
buildpacks
|
||||
Buildpack
|
||||
SBOM
|
||||
Tekton
|
||||
KPack
|
||||
Multiarch
|
||||
Tanzu
|
||||
Kubebuilder
|
||||
finalizer
|
||||
OLM
|
||||
depply
|
||||
CatalogD
|
||||
Rukoak
|
||||
kapp
|
||||
Depply
|
||||
Jetstack
|
||||
kube-lego
|
||||
PKI-usecase
|
||||
multimanager
|
||||
kubebuider
|
||||
kubebuilder
|
||||
FluentD
|
||||
FluentBit
|
||||
OpenMetrics
|
||||
upsert
|
||||
tektone-based
|
||||
ODIT.Services
|
||||
Planetscale
|
||||
vitess
|
||||
Autupdate
|
||||
KubeCon
|
|
@ -0,0 +1,3 @@
|
|||
ARROWS
|
||||
ARROWS
|
||||
ARROWS
|
|
@ -0,0 +1,2 @@
|
|||
{"rule":"MORFOLOGIK_RULE_EN_US","sentence":"^\\QJust create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind)\nYou can also activate replication streaming\\E$"}
|
||||
{"rule":"MORFOLOGIK_RULE_EN_US","sentence":"^\\QResulting needs\nCluster aaS (using crossplane - in this case using aws)\nDBaaS (using crossplane - again usig pq on aws)\nApp aaS\\E$"}
|
|
@ -9,7 +9,7 @@ This current version is probably full of typos - will fix later. This is what ty
|
|||
|
||||
## How did I get there?
|
||||
|
||||
I attended KubeCon + CloudNAtiveCon Europe 2024 as the one and only [ODIT.Services](https://odit.services) representative.
|
||||
I attended KubeCon + CloudNativeCon Europe 2024 as the one and only [ODIT.Services](https://odit.services) representative.
|
||||
|
||||
## Style Guide
|
||||
|
||||
|
|
|
@ -7,4 +7,4 @@ tags:
|
|||
---
|
||||
|
||||
The first "event" of the day was - as always - the opening keynote.
|
||||
Today presented by Redhat and Syntasso.
|
||||
Today presented by Red Hat and Syntasso.
|
||||
|
|
|
@ -6,20 +6,19 @@ tags:
|
|||
- dx
|
||||
---
|
||||
|
||||
By VMware (of all people) - kinda funny that they chose this title with the wole Broadcom fun.
|
||||
By VMware (of all people) - kinda funny that they chose this title with the whole Broadcom fun.
|
||||
The main topic of this talk is: What interface do we choose for what capability.
|
||||
|
||||
## Personas
|
||||
|
||||
* Experts: Kubernetes, DB Engee
|
||||
* Experts: Kubernetes, DB engineer
|
||||
* Users: Employees that just want to do stuff
|
||||
* Platform Engeneers: Connect Users to Services by Experts
|
||||
* Platform engineers: Connect Users to Services by Experts
|
||||
|
||||
## Goal
|
||||
|
||||
* Create Interfaces
|
||||
* Interface: Connect Users to Services
|
||||
* Problem: Many diferent types of Interfaces (SaaS, GUI, CLI) with different capabilities
|
||||
* Create Interfaces: Connect Users to Services
|
||||
* Problem: Many different types of Interfaces (SaaS, GUI, CLI) with different capabilities
|
||||
|
||||
## Dimensions
|
||||
|
||||
|
@ -27,13 +26,13 @@ The main topic of this talk is: What interface do we choose for what capability.
|
|||
|
||||
* Autonomy: external dependency (low) <-> self-service (high)
|
||||
* low: Ticket system -> But sometimes good for getting an expert
|
||||
* high: Portal -> Nice, but somethimes we just need a human contact
|
||||
* high: Portal -> Nice, but sometimes we just need a human contact
|
||||
* Contextual distance: stay in the same tool (low) <-> switch tools (high)
|
||||
* low: IDE plugin -> High potential friction if stuff goes wrong/complex (context switch needed)
|
||||
* high: Wiki or ticketing system
|
||||
* Capability skill: anyone can do it (low) <-> Made for experts (high)
|
||||
* low: transparent sidecar (eg vuln scanner)
|
||||
* high: cli
|
||||
* low: transparent sidecar (e.g. vulnerability scanner)
|
||||
* high: CLI
|
||||
* Interface skill: anyone can do it (low) <-> needs specialized interface skills (high)
|
||||
* low: Documentation in web aka wiki-style
|
||||
* high: Code templates (a sample helm values.yaml or raw terraform provider)
|
||||
|
@ -42,4 +41,4 @@ The main topic of this talk is: What interface do we choose for what capability.
|
|||
|
||||
* You can use multiple interfaces for one capability
|
||||
* APIs (proverbial pig) are the most important interface b/c it can provide the baseline for all other interfaces
|
||||
* The beautification (lipstick) of the API through other interfaces makes uers happy
|
||||
* The beautification (lipstick) of the API through other interfaces makes users happy
|
||||
|
|
|
@ -62,10 +62,10 @@ Presented by the implementers at Thoughtworks (TW).
|
|||
### Observability
|
||||
|
||||
* Tool: Honeycomb
|
||||
* Metrics: Opentelemetry
|
||||
* Metrics: OpenTelemetry
|
||||
* Operator reconcile steps are exposed as traces
|
||||
|
||||
## Q&A
|
||||
|
||||
* Your teams are pretty autonomus -> What to do with more classic teams: Over a multi-year jurney every team settles on the ownership and selfservice approach
|
||||
* How to teams get access to stages: They just get temselves a stage namespace, attach to ingress and have fun (admission handles the rest)
|
||||
* Your teams are pretty autonomous -> What to do with more classic teams: Over a multi-year journey every team settles on the ownership and self-service approach
|
||||
* How teams get access to stages: They just get themselves a stage namespace, attach to ingress and have fun (admission handles the rest)
|
||||
|
|
|
@ -17,6 +17,6 @@ No real value
|
|||
## What do we need
|
||||
|
||||
* User documentation
|
||||
* Adoption & Patnership
|
||||
* Adoption & Partnership
|
||||
* Platform as a Product
|
||||
* Customer feedback
|
||||
|
|
|
@ -10,7 +10,7 @@ tags:
|
|||
- multicluster
|
||||
---
|
||||
|
||||
Part of the Multitannancy Con presented by Adobe
|
||||
Part of the Multi-tenancy Con presented by Adobe
|
||||
|
||||
## Challenges
|
||||
|
||||
|
@ -22,24 +22,24 @@ Part of the Multitannancy Con presented by Adobe
|
|||
|
||||
* Azure in Base - AWS on the edge
|
||||
* Single Tenant Clusters (Simpler Governance)
|
||||
* Responsibility is Shared between App and Platform (Monitoring, Ingress, etc)
|
||||
* Problem: Huge manual investment and overprovisioning
|
||||
* Responsibility is Shared between App and Platform (Monitoring, Ingress, etc.)
|
||||
* Problem: Huge manual investment and over provisioning
|
||||
* Result: Access Control to tenant Namespaces and Capacity Planning -> Pretty much a multi tenant cluster with one tenant per cluster
|
||||
|
||||
### Second Try - Microcluster
|
||||
### Second Try - Micro Clusters
|
||||
|
||||
* One Cluster per Service
|
||||
|
||||
### Third Try - Multitennancy
|
||||
### Third Try - Multi-tenancy
|
||||
|
||||
* Use a bunch of components deployed by platform Team (Ingress, CD/CD, Monitoring, ...)
|
||||
* Harmonized general Runtime (cloud agnostic): Codenamed Ethos -> OVer 300 Clusters
|
||||
* Harmonized general Runtime (cloud-agnostic): Code-named Ethos -> Over 300 Clusters
|
||||
* Both shared clusters (shared by namespace) and dedicated clusters
|
||||
* Cluster config is a basic json with name, capacity, teams
|
||||
* Capacity Managment get's Monitored using Prometheus
|
||||
* Cluster Changes should be non-desruptive -> K8S-Shredder
|
||||
* Cost efficiency: Use good PDBs and livelyness/readyness Probes alongside ressource requests and limits
|
||||
* Cluster config is a basic JSON with name, capacity, teams
|
||||
* Capacity Management gets Monitored using Prometheus
|
||||
* Cluster Changes should be nondestructive -> K8S-Shredder
|
||||
* Cost efficiency: Use good PDBs and liveliness/readiness Probes alongside resource requests and limits
|
||||
|
||||
## Conclusion
|
||||
|
||||
* There is a balance between cost, customization, setup and security between single-tenant und multi-tenant
|
||||
* There is a balance between cost, customization, setup and security between single-tenant and multi-tenant
|
||||
|
|
|
@ -3,42 +3,41 @@ title: Lightning talks
|
|||
weight: 6
|
||||
---
|
||||
|
||||
The lightning talks are 10-minute talks by diferent cncf projects.
|
||||
The lightning talks are 10-minute talks by different CNCF projects.
|
||||
|
||||
## Building contaienrs at scale using buildpacks
|
||||
## Building containers at scale using buildpacks
|
||||
|
||||
A Project lightning talk by heroku and the cncf buildpacks.
|
||||
A Project lightning talk by Heroku and the CNCF buildpacks.
|
||||
|
||||
### How and why buildpacks?
|
||||
|
||||
* What: A simple way to build reproducible contaienr images
|
||||
* Why: Scale, Reuse, Rebase
|
||||
* Rebase: Buildpacks are structured as layers
|
||||
* What: A simple way to build reproducible container images
|
||||
* Why: Scale, Reuse, Rebase: Buildpacks are structured as layers
|
||||
* Dependencies, app builds and the runtime are seperated -> Easy update
|
||||
* How: Use the PAck CLI `pack build <image>` `docker run <image>`
|
||||
* How: Use the Pack CLI `pack build <image>` `docker run <image>`
|
||||
|
||||
## Konveyor
|
||||
|
||||
A Platform for migration of legacy apps to cloudnative platforms.
|
||||
A Platform for migration of legacy apps to cloud native platforms.
|
||||
|
||||
* Parts: Hub, Analysis (with langugage server), Assesment
|
||||
* Parts: Hub, Analysis (with language server), assessment
|
||||
* Roadmap: Multi language support, GenAI, Asset Generation (e.g. Kube Deployments)
|
||||
|
||||
## Argo'S Communuty Driven Development
|
||||
## Argo's Community Driven Development
|
||||
|
||||
Pretty mutch a short intropduction to Argo Project
|
||||
Pretty much a short introduction to Argo Project
|
||||
|
||||
* Project Parts: Workflows (CI), Events, CD, Rollouts
|
||||
* NPS: Net Promoter Score (How likely are you to recoomend this) -> Everyone loves argo (based on their survey)
|
||||
* Rollouts: Can be based with prometheus metrics
|
||||
* NPS: Net Promoter Score (How likely are you to recommend this) -> Everyone loves Argo (based on their survey)
|
||||
* Rollouts: Can be based with Prometheus metrics
|
||||
|
||||
## Flux
|
||||
|
||||
* Components: Helm, Kustomize, Terrafrorm, ...
|
||||
* Flagger Now supports gateway api, prometheus, datadog and more
|
||||
* Components: Helm, Kustomize, Terraform, ...
|
||||
* Flagger Now supports gateway API, Prometheus, Datadog and more
|
||||
* New Releases
|
||||
|
||||
## A quick logg at the TAG App-Delivery
|
||||
## A quick look at the TAG App-Delivery
|
||||
|
||||
* Mission: Everything related to cloud-native application delivery
|
||||
* Bi-Weekly Meetings
|
||||
|
|
|
@ -8,30 +8,30 @@ tags:
|
|||
- dx
|
||||
---
|
||||
|
||||
This talks looks at bootstrapping Platforms using KSere.
|
||||
They do this in regards to AI Workflows.
|
||||
This talk looks at bootstrapping Platforms using KServe.
|
||||
They do this in regard to AI Workflows.
|
||||
|
||||
## Szenario
|
||||
## Scenario
|
||||
|
||||
* Deploy AI Workloads - Sometime consiting of different parts
|
||||
* Deploy AI Workloads - Sometime consisting of different parts
|
||||
* Models get stored in a model registry
|
||||
|
||||
## Baseline
|
||||
|
||||
* Consistent APIs throughout the platform
|
||||
* Not the kube api directly b/c:
|
||||
* Data scientists are a bit overpowered by the kube api
|
||||
* Not only Kubernetes (also monitoring tools, feedback tools, etc)
|
||||
* Not the kube API directly b/c:
|
||||
* Data scientists are a bit overpowered by the kube API
|
||||
* Not only Kubernetes (also monitoring tools, feedback tools, etc.)
|
||||
* Better debugging experience for specific workloads
|
||||
|
||||
## The debugging api
|
||||
## The debugging API
|
||||
|
||||
* Specific API with enhanced statuses and consistent UX across Code and UI
|
||||
* Exampüle Endpoints: Pods, Deployments, InferenceServices
|
||||
* Provides a status summary-> Consistent health info across all related ressources
|
||||
* Example: Deployments have progress/availability, Pods have phases, Containers have readyness -> What do we interpret how?
|
||||
* Evaluation: Progressing, Available Count vs Readyness, Replicafailure, Pod Phase, Container Readyness
|
||||
* The rules themselfes may be pretty complex, but - since the user doesn't have to check them themselves - the status is simple
|
||||
* Example Endpoints: Pods, Deployments, InferenceServices
|
||||
* Provides a status summary-> Consistent health info across all related resources
|
||||
* Example: Deployments have progress/availability, Pods have phases, Containers have readiness -> What do we interpret how?
|
||||
* Evaluation: Progressing, Available Count vs Readiness, Replicafailure, Pod Phase, Container Readiness
|
||||
* The rules themselves may be pretty complex, but - since the user doesn't have to check them themselves - the status is simple
|
||||
|
||||
### Debugging Metrics
|
||||
|
||||
|
@ -47,15 +47,15 @@ They do this in regards to AI Workflows.
|
|||
* Kine is used to replace/extend etcd with the relational dock db -> Relation namespace<->manifests is stored here and RBAC can be used
|
||||
* Launchpad: Select Namespace and check resource (fuel) availability/utilization
|
||||
|
||||
### Clsuter maintainance
|
||||
### Cluster maintenance
|
||||
|
||||
* Deplyoments can be launched to multiple clusters (even two clusters at once) -> HA through identical clusters
|
||||
* The excact same manifests get deployed to two clusters
|
||||
* Cluster desired state is stored externally to enable effortless upogrades, rescale, etc
|
||||
* Deployments can be launched to multiple clusters (even two clusters at once) -> HA through identical clusters
|
||||
* The exact same manifests get deployed to two clusters
|
||||
* Cluster desired state is stored externally to enable effortless upgrades, rescale, etc
|
||||
|
||||
### Versioning API
|
||||
|
||||
* Basicly the dock DB
|
||||
* Basically the dock DB
|
||||
* CRDs are the representations of the inference manifests
|
||||
* Rollbacks, Promotion and History is managed via the CRs
|
||||
* Why not GitOps: Internal Diffs, deployment overrides, customized features
|
||||
|
|
|
@ -7,25 +7,25 @@ tags:
|
|||
- db
|
||||
---
|
||||
|
||||
A short Talk as Part of the DOK day - presendet by the VP of CloudNative at EDB (one of the biggest PG contributors)
|
||||
A short Talk as Part of the Data on Kubernetes day - presented by the VP of Cloud Native at EDB (one of the biggest PG contributors)
|
||||
Stated target: Make the world your single point of failure
|
||||
|
||||
## Proposal
|
||||
|
||||
* Get rid of Vendor-Lockin using the oss projects PG, K8S and CnPG
|
||||
* Get rid of Vendor-Lockin using the OSS projects PG, K8S and CnPG
|
||||
* PG was the DB of the year 2023 and a bunch of other times in the past
|
||||
* CnPG is a Level 5 mature operator
|
||||
|
||||
## 4 Pillars
|
||||
|
||||
* Seamless KubeAPI Integration (Operator PAttern)
|
||||
* Seamless Kube API Integration (Operator Pattern)
|
||||
* Advanced observability (Prometheus Exporter, JSON logging)
|
||||
* Declarative Config (Deploy, Scale, Maintain)
|
||||
* Secure by default (Robust contaienrs, mTLS, and so on)
|
||||
* Secure by default (Robust containers, mTLS, and so on)
|
||||
|
||||
## Clusters
|
||||
|
||||
* Basic Ressource that defines name, instances, snyc and storage (and other params that have same defaults)
|
||||
* Basic Resource that defines name, instances, sync and storage (and other parameters that have same defaults)
|
||||
* Implementation: Operator creates:
|
||||
* The volumes (PG_Data, WAL (Write ahead log)
|
||||
* Primary and Read-Write Service
|
||||
|
@ -35,15 +35,15 @@ Stated target: Make the world your single point of failure
|
|||
* Failure detected
|
||||
* Stop R/W Service
|
||||
* Promote Replica
|
||||
* Activat R/W Service
|
||||
* Kill old promary and demote to replica
|
||||
* Activate R/W Service
|
||||
* Kill old primary and demote to replica
|
||||
|
||||
## Backup/Recovery
|
||||
|
||||
* Continuos Backup: Write Ahead Log Backup to object store
|
||||
* Continuous Backup: Write Ahead Log Backup to object store
|
||||
* Physical: Create from primary or standby to object store or kube volumes
|
||||
* Recovery: Copy full backup and apply WAL until target (last transactio or specific timestamp) is reached
|
||||
* Replica Cluster: Basicly recreates a new cluster to a full recovery but keeps the cluster in Read-Only Replica Mode
|
||||
* Recovery: Copy full backup and apply WAL until target (last transaction or specific timestamp) is reached
|
||||
* Replica Cluster: Basically recreates a new cluster to a full recovery but keeps the cluster in Read-Only Replica Mode
|
||||
* Planned: Backup Plugin Interface
|
||||
|
||||
## Multi-Cluster
|
||||
|
@ -51,21 +51,21 @@ Stated target: Make the world your single point of failure
|
|||
* Just create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind)
|
||||
* You can also activate replication streaming
|
||||
|
||||
## Reccomended architecutre
|
||||
## Recommended architecture
|
||||
|
||||
* Dev Cluster: 1 Instance without PDB and with Continuos backup
|
||||
* Prod: 3 Nodes with automatic failover and continuos backups
|
||||
* Dev Cluster: 1 Instance without PDB and with Continuous backup
|
||||
* Prod: 3 Nodes with automatic failover and continuous backups
|
||||
* Symmetric: Two clusters
|
||||
* Primary: 3-Node Cluster
|
||||
* Secondary: WAL-Based 3-Node Cluster with a designated primary (to take over if primary cluster fails)
|
||||
* Symmetric Streaming: Same as Secondary, but you manually enable the streaming api for live replication
|
||||
* Secondary: WAL based 3-Node Cluster with a designated primary (to take over if primary cluster fails)
|
||||
* Symmetric Streaming: Same as Secondary, but you manually enable the streaming API for live replication
|
||||
* Cascading Replication: Scale Symmetric to more clusters
|
||||
* Single availability zone: Well, do your best to spread to nodes and aspire to streched kubernetes to more AZs
|
||||
* Single availability zone: Well, do your best to spread to nodes and aspire to stretched Kubernetes to more AZs
|
||||
|
||||
## Roadmap
|
||||
|
||||
* Replica Cluster (Symmetric) Switchover
|
||||
* Synchronous Symmetric
|
||||
* 3rd PArty Plugins
|
||||
* 3rd Party Plugins
|
||||
* Manage DBs via the Operator
|
||||
* Storage Autoscaling
|
||||
|
|
|
@ -4,14 +4,14 @@ weight: 9
|
|||
---
|
||||
|
||||
> When I say serverless I don't mean lambda - I mean serverless
|
||||
> That is thousands of lines of yaml - but I don't want to depress you
|
||||
> That is thousands of lines of YAML - but I don't want to depress you
|
||||
> It will be eventually done
|
||||
> Imagine this error is not happening
|
||||
> Just imagine how I did this last night
|
||||
|
||||
## Goal
|
||||
|
||||
* Take my sourcecode and run it, scale it - jsut don't ask me
|
||||
* Take my source code and run it, scale it - just don't ask me
|
||||
|
||||
## Baseline
|
||||
|
||||
|
@ -20,9 +20,9 @@ weight: 9
|
|||
* Use Kaniko/Shipwright for building
|
||||
* Use Dupr for inter-service Communication
|
||||
|
||||
## Openfunction
|
||||
## Open function
|
||||
|
||||
> The glue between different tools to achive serverless
|
||||
> The glue between different tools to achieve serverless
|
||||
|
||||
* CRD that describes:
|
||||
* Build this image and push it to the registry
|
||||
|
@ -35,8 +35,8 @@ weight: 9
|
|||
|
||||
* Open Questions
|
||||
* Where are the serverless servers -> Cluster, dependencies, secrets
|
||||
* How do I create DBs, etc
|
||||
* How do I create DBs, etc.
|
||||
* Resulting needs
|
||||
* Cluster aaS (using crossplane - in this case using aws)
|
||||
* DBaaS (using crossplane - again usig pq on aws)
|
||||
* App aaS
|
||||
* CLUSTERaaS (using crossplane - in this case using AWS)
|
||||
* DBaaS (using crossplane - again using pg on AWS)
|
||||
* APPaaS
|
||||
|
|
|
@ -14,21 +14,21 @@ Another talk as part of the Data On Kubernetes Day.
|
|||
|
||||
* Managed: Atlas
|
||||
* Semi: Cloud manager
|
||||
* Selfhosted: Enterprise and community operator
|
||||
* Self-hosted: Enterprise and community operator
|
||||
|
||||
### Mongo on K8s
|
||||
### MongoDB on K8s
|
||||
|
||||
* Cluster Architecture
|
||||
* Control Plane: Operator
|
||||
* Data Plane: MongoDB Server + Agen (Sidecar Proxy)
|
||||
* Data Plane: MongoDB Server + Agent (Sidecar Proxy)
|
||||
* Enterprise Operator
|
||||
* Opsmanager CR: Deploys 3-node operator DB and OpsManager
|
||||
* MongoDB CR: The MongoDB cLusters (Compromised of agents)
|
||||
* Advanced Usecase: Data Platform with mongodb on demand
|
||||
* Control Plane on one cluster (or on VMs/Hardmetal), data plane in tennant clusters
|
||||
* OpsManager CR: Deploys 3-node operator DB and OpsManager
|
||||
* MongoDB CR: The MongoDB clusters (Compromised of agents)
|
||||
* Advanced use case: Data Platform with MongoDB on demand
|
||||
* Control Plane on one cluster (or on VMs/Bare-metal), data plane in tenant clusters
|
||||
* Result: MongoDB CR can not relate to OpsManager CR directly
|
||||
|
||||
## Pitfalls
|
||||
|
||||
* Storage: Agnostic, Topology aware, configureable and resizeable (can't be done with statefulset)
|
||||
* Storage: Agnostic, Topology aware, configurable and resizable (can't be done with Statefulset)
|
||||
* Networking: Cluster-internal (Pod to Pod/Service), External (Split horizon over multicluster)
|
||||
|
|
|
@ -9,8 +9,8 @@ tags:
|
|||
|
||||
## CNCF Platform maturity model
|
||||
|
||||
* Was donated to the cncf by syntasso
|
||||
* Constantly evolving since 1.0 in november 2023
|
||||
* Was donated to the CNCF by Syntasso
|
||||
* Constantly evolving since 1.0 in November 2023
|
||||
|
||||
### Overview
|
||||
|
||||
|
@ -25,7 +25,7 @@ tags:
|
|||
* Investment: How are funds/staff allocated to platform capabilities
|
||||
* Adoption: How and why do users discover this platform
|
||||
* Interfaces: How do users interact with and consume platform capabilities
|
||||
* Operations: How are platforms and capabilities planned, prioritzed, developed and maintained
|
||||
* Operations: How are platforms and capabilities planned, prioritized, developed and maintained
|
||||
* Measurement: What is the process for gathering and incorporating feedback/learning?
|
||||
|
||||
## Goals
|
||||
|
@ -34,24 +34,24 @@ tags:
|
|||
* Outcomes & Practices
|
||||
* Where are you at
|
||||
* Limits & Opportunities
|
||||
* Behaviours and outcome
|
||||
* Behaviors and outcome
|
||||
* Balance People and processes
|
||||
|
||||
## Typical Journeys
|
||||
|
||||
### Steps of the jurney
|
||||
### Steps of the journey
|
||||
|
||||
1. What are your goals and limitations
|
||||
2. What is my current landscape
|
||||
3. Plan babysteps & iterate
|
||||
3. Plan baby steps & iterate
|
||||
|
||||
### Szenarios
|
||||
### Scenarios
|
||||
|
||||
* Bad: I want to improve my k8s platform
|
||||
* Good: Scaling an enterprise COE (Center Of Excellence)
|
||||
* What: Onboard 20 Teams within 20 Months and enforce 8 security regulations
|
||||
* Where: We have a dedicated team of centrally funded people
|
||||
* Lay the foundation: More funding for more larger teams -> Switch from Project to platform mindset
|
||||
* Lay the foundation: More funding for more, larger teams -> Switch from Project to platform mindset
|
||||
* Do your technical Due diligence in parallel
|
||||
|
||||
## Key Lessons
|
||||
|
@ -60,8 +60,8 @@ tags:
|
|||
* Know your landscape
|
||||
* Plan in baby steps and iterate
|
||||
* Lay the foundation for building the right thing and not just anything
|
||||
* Dont forget to do your technical dd in parallel
|
||||
* Don't forget to do your technical dd in parallel
|
||||
|
||||
## Conclusion
|
||||
|
||||
* Majurity model is a helpful part but not the entire plan
|
||||
* Maturity model is a helpful part but not the entire plan
|
||||
|
|
|
@ -6,43 +6,43 @@ tags:
|
|||
- network
|
||||
---
|
||||
|
||||
Held by Cilium regarding ebpf and hubble
|
||||
Held by Cilium regarding eBPF and Hubble
|
||||
|
||||
## eBPF
|
||||
|
||||
> Extend the capabilities of the kernel without requiring to change the kernel source code or load modules
|
||||
|
||||
* Benefits: Reduce performance overhead, gain deep visibility while being widely available
|
||||
* Example Tools: Parca (Profiling), Cilium (Networking), Hubble (Opservability), Tetragon (Security)
|
||||
* Example Tools: Parca (Profiling), Cilium (Networking), Hubble (Observability), Tetragon (Security)
|
||||
|
||||
## Cilium
|
||||
|
||||
> Opensource Solution for network connectivity between workloads
|
||||
> Open source Solution for network connectivity between workloads
|
||||
|
||||
## Hubble
|
||||
|
||||
> Observability-Layer for cilium
|
||||
|
||||
### Featureset
|
||||
### Feature set
|
||||
|
||||
* CLI: TCP-Dump on steroids + API Client
|
||||
* UI: Graphical dependency and connectivity map
|
||||
* Prometheus + Grafana + Opentelemetry compatible
|
||||
* Prometheus + Grafana + OpenTelemetry compatible
|
||||
* Metrics up to L7
|
||||
|
||||
### Where can it be used
|
||||
|
||||
* Service dependency with frequency
|
||||
* Kinds of http calls
|
||||
* Kinds of HTTP calls
|
||||
* Network Problems between L4 and L7 (including DNS)
|
||||
* Application Monitoring through status codes and latency
|
||||
* Security-Related Network Blocks
|
||||
* Services accessed from outside the cluser
|
||||
* Services accessed from outside the cluster
|
||||
|
||||
### Architecture
|
||||
|
||||
* Cilium Agent: Runs as the CNI für all Pods
|
||||
* Server: Runs on each node and retrieves the ebpf from cilium
|
||||
* Cilium Agent: Runs as the CNI for all Pods
|
||||
* Server: Runs on each node and retrieves the eBPF from cilium
|
||||
* Relay: Provide visibility throughout all nodes
|
||||
|
||||
## TL;DR
|
||||
|
|
|
@ -7,10 +7,10 @@ weight: 1
|
|||
Day one is the Day for co-located events aka CloudNativeCon.
|
||||
I spent most of the day attending the Platform Engineering Day - as one might have guessed it's all about platform engineering.
|
||||
|
||||
Everything started with badge pickup - a very smooth experence (but that may be related to me showing up an hour or so too early).
|
||||
Everything started with badge pickup - a very smooth experience (but that may be related to me showing up an hour or so too early).
|
||||
|
||||
## Talk reccomandations
|
||||
## Talk recommendations
|
||||
|
||||
* Beyond Platform Thinking...
|
||||
* Hitchhikers Guide to ...
|
||||
* Hitchhiker's Guide to ...
|
||||
* To K8S and beyond...
|
||||
|
|
|
@ -6,18 +6,18 @@ tags:
|
|||
- opening
|
||||
---
|
||||
|
||||
The opening keynote started - as is the tradition with keynotes - with an "motivational" opening video.
|
||||
The opening keynote started - as is the tradition with keynotes - with a "motivational" opening video.
|
||||
The keynote itself was presented by the CEO of the CNCF.
|
||||
|
||||
## The numbers
|
||||
|
||||
* Over 2000 attendees
|
||||
* Over 12000 attendees
|
||||
* 10 Years of Kubernetes
|
||||
* 60% of large organizations expect rapid cost increases due to AI/ML (FinOps Survey)
|
||||
|
||||
## The highlights
|
||||
|
||||
* Everyone uses cloudnative
|
||||
* Everyone uses cloud native
|
||||
* AI uses Kubernetes b/c the UX is way better than classic tools
|
||||
* Especially when transferring from dev to prod
|
||||
* We need standardization
|
||||
|
@ -26,10 +26,10 @@ The keynote itself was presented by the CEO of the CNCF.
|
|||
## Live demo
|
||||
|
||||
* KIND cluster on desktop
|
||||
* Protptype Stack (develop on client)
|
||||
* Prototype Stack (develop on client)
|
||||
* Kubernetes with the LLM
|
||||
* Host with LLVA (image describe model), moondream and OLLAMA (the model manager/registry()
|
||||
* Host with LLAVA (image describe model), moondream and OLLAMA (the model manager/registry()
|
||||
* Prod Stack (All in kube)
|
||||
* Kubernetes with LLM, LLVA, OLLAMA, moondream
|
||||
* Available Models: llava, mistral bokllava (llava*mistral)
|
||||
* Host takes picture, ai describes what is pictures (in our case the conference audience)
|
||||
* Available Models: LLAVA, mistral bokllava (LLAVA*mistral)
|
||||
* Host takes picture, AI describes what is pictures (in our case the conference audience)
|
||||
|
|
|
@ -7,7 +7,7 @@ tags:
|
|||
- panel
|
||||
---
|
||||
|
||||
A podium discussion (somewhat scripted) lead by Pryanka
|
||||
A podium discussion (somewhat scripted) lead by Priyanka
|
||||
|
||||
## Guests
|
||||
|
||||
|
@ -17,24 +17,24 @@ A podium discussion (somewhat scripted) lead by Pryanka
|
|||
|
||||
## Discussion
|
||||
|
||||
* What do you use as the base of dev for ollama
|
||||
* Jeff: The concepts from docker, git, kubernetes
|
||||
* How is the balance between ai engi and ai ops
|
||||
* Jeff: The classic dev vs ops devide, many ML-Engi don't think about
|
||||
* What do you use as the base of dev for OLLAMA
|
||||
* Jeff: The concepts from docker, git, Kubernetes
|
||||
* How is the balance between AI engineer and AI ops
|
||||
* Jeff: The classic dev vs ops divide, many ML-Engineer don't think about
|
||||
* Paige: Yessir
|
||||
* How does infra keep up with the fast research
|
||||
* Paige: Well, they don't - but they do their best and Cloudnative is cool
|
||||
* Jeff: Well we're not google, but kubernetes is the saviour
|
||||
* Paige: Well, they don't - but they do their best and Cloud native is cool
|
||||
* Jeff: Well we're not google, but Kubernetes is the savior
|
||||
* What are scaling constraints
|
||||
* Jeff: Currently sizing of models is still in it's infancy
|
||||
* Jeff: Currently sizing of models is still in its infancy
|
||||
* Jeff: There will be more specific hardware and someone will have to support it
|
||||
* Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization)
|
||||
* Paige: Optimization of smaller models
|
||||
* What technologies need to be open source licensed
|
||||
* Jeff: The model b/c access and trust
|
||||
* Tim: The models and base execution environemtn -> Vendor agnosticism
|
||||
* Paige: Yes and remixes are really imporant for development
|
||||
* Tim: The models and base execution environment -> Vendor agnosticism
|
||||
* Paige: Yes and remixes are really important for development
|
||||
* Anything else
|
||||
* Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world
|
||||
* Paige: Currently many people just use paid apis to abstract the infra, but we need this stuff selfhostable
|
||||
* Tim: I don'T want to know about the hardware, the whole infra side should be done by the cloudnative teams to let ML-Engi to just be ML-Engine
|
||||
* Paige: Currently many people just use paid APIs to abstract the infra, but we need this stuff self-hostable
|
||||
* Tim: I don't want to know about the hardware, the whole infra side should be done by the cloud native teams to let ML-Engineer to just be ML-Engine
|
||||
|
|
|
@ -9,7 +9,7 @@ tags:
|
|||
|
||||
Kevin and Sanjay from NVIDIA
|
||||
|
||||
## Enabeling GPUs in Kubernetes today
|
||||
## Enabling GPUs in Kubernetes today
|
||||
|
||||
* Host level components: Toolkit, drivers
|
||||
* Kubernetes components: Device plugin, feature discovery, node selector
|
||||
|
@ -18,24 +18,24 @@ Kevin and Sanjay from NVIDIA
|
|||
## GPU sharing
|
||||
|
||||
* Time slicing: Switch around by time
|
||||
* Multi Process Service: Run allways on the GPU but share (space-)
|
||||
* Multi Process Service: Always run on the GPU but share (space-)
|
||||
* Multi Instance GPU: Space-seperated sharing on the hardware
|
||||
* Virtual GPU: Virtualices Time slicing or MIG
|
||||
* Virtual GPU: Virtualizes Time slicing or MIG
|
||||
* CUDA Streams: Run multiple kernels in a single app
|
||||
|
||||
## Dynamic resource allocation
|
||||
|
||||
* A new alpha feature since Kube 1.26 for dynamic ressource requesting
|
||||
* You just request a ressource via the API and have fun
|
||||
* A new alpha feature since Kube 1.26 for dynamic resource requesting
|
||||
* You just request a resource via the API and have fun
|
||||
* The sharing itself is an implementation detail
|
||||
|
||||
## GPU scale out challenges
|
||||
## GPU scale-out challenges
|
||||
|
||||
* NVIDIA Picasso is a foundry for model creation powered by Kubernetes
|
||||
* The workload is the training workload split into batches
|
||||
* Challenge: Schedule multiple training jobs by different users that are prioritized
|
||||
|
||||
### Topology aware placments
|
||||
### Topology aware placements
|
||||
|
||||
* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
|
||||
* Target: optimize related jobs based on GPU node distance and NUMA placement
|
||||
|
@ -44,11 +44,11 @@ Kevin and Sanjay from NVIDIA
|
|||
|
||||
* Stuff can break, resulting in slowdowns or errors
|
||||
* Challenge: Detect faults and handle them
|
||||
* Observability both in-band and out ouf band that expose node conditions in kubernetes
|
||||
* Observability both in-band and out of band that expose node conditions in Kubernetes
|
||||
* Needed: Automated fault-tolerant scheduling
|
||||
|
||||
### Multi-dimensional optimization
|
||||
### Multidimensional optimization
|
||||
|
||||
* There are different KPIs: starvation, prioprity, occupanccy, fainrness
|
||||
* Challenge: What to choose (the multi-dimensional decision problemn)
|
||||
* There are different KPIs: starvation, priority, occupancy, fairness
|
||||
* Challenge: What to choose (the multidimensional decision problem)
|
||||
* Needed: A scheduler that can balance the dimensions
|
||||
|
|
|
@ -15,11 +15,11 @@ Jorge Palma from Microsoft with a quick introduction.
|
|||
* Containerized models
|
||||
* GPUs in the cluster (install, management)
|
||||
|
||||
## Kubernetes AI Toolchain (KAITO)
|
||||
## Kubernetes AI Tool chain (KAITO)
|
||||
|
||||
* Kubernetes operator that interacts with
|
||||
* Node provisioner
|
||||
* Deployment
|
||||
* Simple CRD that decribes a model, infra and have fun
|
||||
* Creates inferance endpoint
|
||||
* Models are currently 10 (Hugginface, LLMA, etc)
|
||||
* Simple CRD that describes a model, infra and have fun
|
||||
* Creates inference endpoint
|
||||
* Models are currently 10 (Hugginface, LLMA, etc.)
|
||||
|
|
|
@ -6,14 +6,14 @@ tags:
|
|||
- panel
|
||||
---
|
||||
|
||||
A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN.
|
||||
A panel discussion with moderation by Google and participants from Google, Alluxio, Ampere and CERN.
|
||||
It was pretty scripted with prepared (sponsor specific) slides for each question answered.
|
||||
|
||||
## Takeaways
|
||||
|
||||
* Deploying a ML should become the new deploy a web app
|
||||
* The hardware should be fully utilized -> Better ressource sharing and scheduling
|
||||
* Smaller LLMs on cpu only is preyy cost efficient
|
||||
* Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow
|
||||
* Deploying an ML should become the new deployment a web app
|
||||
* The hardware should be fully utilized -> Better resource sharing and scheduling
|
||||
* Smaller LLMs on CPU only is pretty cost-efficient
|
||||
* Better scheduling by splitting into storage + CPU (prepare) and GPU (run) nodes to create a just-in-time flow
|
||||
* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
|
||||
* We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads
|
||||
* We should be flexible regarding hardware, multi-cluster workloads and hybrid (onprem, burst to cloud) workloads
|
||||
|
|
|
@ -5,41 +5,41 @@ tags:
|
|||
- keynote
|
||||
---
|
||||
|
||||
Nikhita presented projects that merge CloudNative and AI.
|
||||
PAtrick Ohly Joined for DRA
|
||||
Nikhita presented projects that merge cloud native and AI.
|
||||
Patrick Ohly Joined for DRA
|
||||
|
||||
### The "news"
|
||||
|
||||
* New work group AI
|
||||
* More tools are including ai features
|
||||
* New updated cncf for children feat AI
|
||||
* More tools are including AI features
|
||||
* New updated CNCF for children feat AI
|
||||
* One decade of Kubernetes
|
||||
* DRA is in alpha
|
||||
|
||||
### DRA
|
||||
|
||||
* A new API for resources (node-local and node-attached)
|
||||
* Sharing of ressources between cods and containers
|
||||
* Sharing of resources between cods and containers
|
||||
* Vendor specific stuff are abstracted by a vendor driver controller
|
||||
* The kube scheduler can interact with the vendor parameters for scheduling and autoscaling
|
||||
|
||||
### Cloudnative AI ecosystem
|
||||
### Cloud native AI ecosystem
|
||||
|
||||
* Kube is the seed for the AI infra plant
|
||||
* Kubeflow users wanted AI registries
|
||||
* LLM on the edge
|
||||
* Opentelemetry bring semandtics
|
||||
* OpenTelemetry bring semantics
|
||||
* All of these tools form a symbiosis between
|
||||
* Topics of discussions
|
||||
|
||||
### The working group AI
|
||||
|
||||
* It was formed in october 2023
|
||||
* They are working on the whitepaper (cloudnative and ai) wich was opublished on 19.03.2024
|
||||
* The landscape "cloudnative and ai" is WIP and will be merged into the main CNCF landscape
|
||||
* It was formed in October 2023
|
||||
* They are working on the white paper (cloud native and AI) which was published on 19.03.2024
|
||||
* The landscape "cloud native and AI" is WIP and will be merged into the main CNCF landscape
|
||||
* The future focus will be on security and cost efficiency (with a hint of sustainability)
|
||||
|
||||
### LFAI and CNCF
|
||||
|
||||
* The direcor of the AI foundation talks abouzt ai and cloudnative
|
||||
* They are looking forward to more colaboraion
|
||||
* The director of the AI foundation talks about AI and cloud native
|
||||
* They are looking forward to more collaboration
|
||||
|
|
|
@ -14,7 +14,7 @@ The entire talk was very short, but it was a nice demo of init containers
|
|||
* Security is hard - distroless sounds like a nice helper
|
||||
* Basic Challenge: Usability-Security Dilemma -> But more usability doesn't mean less secure, but more updating
|
||||
* Distro: Kernel + Software Packages + Package manager (optional) -> In Containers just without the kernel
|
||||
* Distroless: No package manager, no shell, no webcluent (curl/wget) - only minimal sofware bundels
|
||||
* Distroless: No package manager, no shell, no web client (curl/wget) - only minimal software bundles
|
||||
|
||||
## Tools for distroless image creation
|
||||
|
||||
|
@ -29,13 +29,13 @@ The entire talk was very short, but it was a nice demo of init containers
|
|||
|
||||
## Demo
|
||||
|
||||
* A (rough) distroless postgres with alpine build step and scratch final step
|
||||
* A (rough) distroless Postgres with alpine build step and scratch final step
|
||||
* A basic pg:alpine container used for init with a shared data volume
|
||||
* The init uses the pg admin user to initialize the pg server (you don't need the admin creds after this)
|
||||
* The init uses the pg admin user to initialize the pg server (you don't need the admin credentials after this)
|
||||
|
||||
### Kube
|
||||
|
||||
* K apply failed b/c no internet, but was fixed by connecting to wifi
|
||||
* K apply failed b/c no internet, but was fixed by connecting to Wi-Fi
|
||||
* Without the init container the pod just crashes, with the init container the correct config gets created
|
||||
|
||||
### Docker compose
|
||||
|
|
|
@ -13,63 +13,63 @@ A talk by elastic.
|
|||
|
||||
## About elastic
|
||||
|
||||
* Elestic cloud as a managed service
|
||||
* Elastic cloud as a managed service
|
||||
* Deployed across AWS/GCP/Azure in over 50 regions
|
||||
* 600.000+ Containers
|
||||
* 600000+ Containers
|
||||
|
||||
### Elastic and Kube
|
||||
|
||||
* They offer elastic obervability
|
||||
* They offer elastic observability
|
||||
* They offer the ECK operator for simplified deployments
|
||||
|
||||
## The baseline
|
||||
|
||||
* Goal: A large scale (1M+ containers resilient platform on k8s
|
||||
* Goal: A large scale (1M+ containers) resilient platform on k8s
|
||||
* Architecture
|
||||
* Global Control: The control plane (api) for users with controllers
|
||||
* Regional Apps: The "shitload" of kubernetes clusters where the actual customer services live
|
||||
* Global Control: The control plane (API) for users with controllers
|
||||
* Regional Apps: The "shitload" of Kubernetes clusters where the actual customer services live
|
||||
|
||||
## Scalability
|
||||
|
||||
* Challenge: How large can our cluster be, how many clusters do we need
|
||||
* Problem: Only basic guidelines exist for that
|
||||
* Decision: Horizontaly scale the number of clusters (5ßß-1K nodes each)
|
||||
* Decision: Horizontally scale the number of clusters (5ßß-1K nodes each)
|
||||
* Decision: Disposable clusters
|
||||
* Throw away without data loss
|
||||
* Single source of throuth is not cluster etcd but external -> No etcd backups needed
|
||||
* Single source of truth is not cluster etcd but external -> No etcd backups needed
|
||||
* Everything can be recreated any time
|
||||
|
||||
## Controllers
|
||||
|
||||
{{% notice style="note" %}}
|
||||
I won't copy the explanations of operators/controllers in this notes
|
||||
I won't copy the explanations of operators/controllers in these notes
|
||||
{{% /notice %}}
|
||||
|
||||
* Many different controllers, including (but not limited to)
|
||||
* cluster controler: Register cluster to controller
|
||||
* Many controllers, including (but not limited to)
|
||||
* cluster controller: Register cluster to controller
|
||||
* Project controller: Schedule user's project to cluster
|
||||
* Product controllers (Elasticsearch, Kibana, etc.)
|
||||
* Ingress/Certmanager
|
||||
* Ingress/Cert manager
|
||||
* Sometimes controllers depend on controllers -> potential complexity
|
||||
* Pro:
|
||||
* Resilient (Selfhealing)
|
||||
* Resilient (Self-healing)
|
||||
* Level triggered (desired state vs procedure triggered)
|
||||
* Simple reasoning when comparing desired state vs state machine
|
||||
* Official controller runtime lib
|
||||
* Workque: Automatic Dedup, Retry backoff and so on
|
||||
* Workqueue: Automatic Dedup, Retry back off and so on
|
||||
|
||||
## Global Controllers
|
||||
|
||||
* Basic operation
|
||||
* Uses project config from Elastic cloud as the desired state
|
||||
* The actual state is a k9s ressource in another cluster
|
||||
* Challenge: Where is the source of thruth if the data is not stored in etc
|
||||
* Solution: External datastore (postgres)
|
||||
* Challenge: How do we sync the db sources to kubernetes
|
||||
* The actual state is a k9s resource in another cluster
|
||||
* Challenge: Where is the source of truth if the data is not stored in etcd
|
||||
* Solution: External data store (Postgres)
|
||||
* Challenge: How do we sync the db sources to Kubernetes
|
||||
* Potential solutions: Replace etcd with the external db
|
||||
* Chosen solution:
|
||||
* The controllers don't use CRDs for storage, but they expose a webapi
|
||||
* Reconciliation still now interacts with the external db and go channels (que) instead
|
||||
* The controllers don't use CRDs for storage, but they expose a web-API
|
||||
* Reconciliation still now interacts with the external db and go channels (queue) instead
|
||||
* Then the CRs for the operators get created by the global controller
|
||||
|
||||
### Large scale
|
||||
|
@ -82,10 +82,10 @@ I won't copy the explanations of operators/controllers in this notes
|
|||
### Reconcile
|
||||
|
||||
* User-driven events are processed asap
|
||||
* reconcole of everything should happen, bus with low prio slowly in the background
|
||||
* Solution: Status: LastReconciledRevision (timestamp) get's compare to revision, if larger -> User change
|
||||
* Prioritization: Just a custom event handler with the normal queue and a low prio
|
||||
* Low Prio Queue: Just a queue that adds items to the normal work-queue with a rate limit
|
||||
* reconcile of everything should happen, bus with low priority slowly in the background
|
||||
* Solution: Status: LastReconciledRevision (timestamp) gets compare to revision, if larger -> User change
|
||||
* Prioritization: Just a custom event handler with the normal queue and a low priority
|
||||
* Queue: Just a queue that adds items to the normal work-queue with a rate limit
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
|
|
|
@ -6,39 +6,39 @@ tags:
|
|||
- security
|
||||
---
|
||||
|
||||
A talk by Google and Microsoft with the premise of bether auth in k8s.
|
||||
A talk by Google and Microsoft with the premise of better auth in k8s.
|
||||
|
||||
## Baselines
|
||||
|
||||
* Most access controllers have read access to all secrets -> They are not really designed for keeping these secrets
|
||||
* Result: CVEs
|
||||
* Example: Just use ingress, nginx, put in some lua code in the config and voila: Service account token
|
||||
* Example: Just use ingress, nginx, put in some Lua code in the config and e voilà: Service account token
|
||||
* Fix: No more fun
|
||||
|
||||
## Basic solutions
|
||||
|
||||
* Seperate Control (the controller) from data (the ingress)
|
||||
* Separate Control (the controller) from data (the ingress)
|
||||
* Namespace limited ingress
|
||||
|
||||
## Current state of cross namespace stuff
|
||||
|
||||
* Why: Reference tls cert for gateway api in the cert team'snamespace
|
||||
* Why: Reference TLS cert for gateway API in the cert team's namespace
|
||||
* Why: Move all ingress configs to one namespace
|
||||
* Classic Solution: Annotations in contour that references a namespace that contains all certs (rewrites secret to certs/secret)
|
||||
* Gateway Solution:
|
||||
* Gateway TLS secret ref includes a namespace
|
||||
* ReferenceGrant pretty mutch allows referencing from X (Gatway) to Y (Secret)
|
||||
* ReferenceGrant pretty much allows referencing from X (Gateway) to Y (Secret)
|
||||
* Limits:
|
||||
* Has to be implemented via controllers
|
||||
* The controllers still have readall - they just check if they are supposed to do this
|
||||
* The controllers still have read all - they just check if they are supposed to do this
|
||||
|
||||
## Goals
|
||||
|
||||
### Global
|
||||
|
||||
* Grant access to controller to only ressources relevant for them (using references and maybe class segmentation)
|
||||
* Grant access to controller to only resources relevant for them (using references and maybe class segmentation)
|
||||
* Allow for safe cross namespace references
|
||||
* Make it easy for api devs to adopt it
|
||||
* Make it easy for API devs to adopt it
|
||||
|
||||
### Personas
|
||||
|
||||
|
@ -50,20 +50,20 @@ A talk by Google and Microsoft with the premise of bether auth in k8s.
|
|||
|
||||
* Alex: Define relationships via ReferencePatterns
|
||||
* Kai: Specify controller identity (Serviceaccount), define relationship API
|
||||
* Rohan: Define cross namespace references (aka ressource grants that allow access to their ressources)
|
||||
* Rohan: Define cross namespace references (aka resource grants that allow access to their resources)
|
||||
|
||||
## Result of the paper
|
||||
|
||||
### Architecture
|
||||
|
||||
* ReferencePattern: Where do i find the references -> example: GatewayClass in the gateway API
|
||||
* ReferenceConsumer: Who (IOdentity) has access under which conditions?
|
||||
* ReferenceConsumer: Who (Identity) has access under which conditions?
|
||||
* ReferenceGrant: Allow specific references
|
||||
|
||||
### POC
|
||||
|
||||
* Minimum access: You only get access if the grant is there AND the reference actually exists
|
||||
* Their basic implementation works with the kube api
|
||||
* Their basic implementation works with the kube API
|
||||
|
||||
### Open questions
|
||||
|
||||
|
@ -74,9 +74,9 @@ A talk by Google and Microsoft with the premise of bether auth in k8s.
|
|||
|
||||
## Alternative
|
||||
|
||||
* Idea: Just extend RBAC Roles with a selector (match labels, etc)
|
||||
* Idea: Just extend RBAC Roles with a selector (match labels, etc.)
|
||||
* Problems:
|
||||
* Requires changes to kubernetes core auth
|
||||
* Requires changes to Kubernetes core auth
|
||||
* Everything bus list and watch is a pain
|
||||
* How do you handle AND vs OR selection
|
||||
* Field selectors: They exist
|
||||
|
@ -84,5 +84,5 @@ A talk by Google and Microsoft with the premise of bether auth in k8s.
|
|||
|
||||
## Meanwhile
|
||||
|
||||
* Prefer tools that support isolatiobn between controller and dataplane
|
||||
* Prefer tools that support isolation between controller and data-plane
|
||||
* Disable all non-needed features -> Especially scripting
|
||||
|
|
|
@ -6,32 +6,32 @@ tags:
|
|||
- dx
|
||||
---
|
||||
|
||||
A talk by UX and software people at RedHat (Podman team).
|
||||
The talk mainly followed the academic study process (aka this is the survey I did for my bachelors/masters thesis).
|
||||
A talk by UX and software people at Red Hat (Podman team).
|
||||
The talk mainly followed the academic study process (aka this is the survey I did for my bachelor's/master's thesis).
|
||||
|
||||
## Research
|
||||
|
||||
* User research Study including 11 devs and platform engineers over three months
|
||||
* Focus was on an new podman desktop feature
|
||||
* Experence range 2-3 years experience average (low no experience, high oldschool kube)
|
||||
* Focus was on a new Podman desktop feature
|
||||
* Experience range 2-3 years experience average (low no experience, high old school kube)
|
||||
* 16 questions regarding environment, workflow, debugging and pain points
|
||||
* Analysis: Affinity mapping
|
||||
|
||||
## Findings
|
||||
|
||||
* Where do I start when things are broken? -> There may be solutions, but devs don't know about them
|
||||
* Network debugging is hard b/c many layers and problems occuring in between cni and infra are really hard -> Network topology issues are rare but hard
|
||||
* YAML identation -> Tool support is needed for visualisation
|
||||
* YAML validation -> Just use validation in dev and gitops
|
||||
* YAML Cleanup -> Normalize YAML (order, anchors, etc) for easy diff
|
||||
* Inadequate security analysis (too verbose, non-issues are warnings) -> Realtime insights (and during dev)
|
||||
* Network debugging is hard b/c many layers and problems occurring in between CNI and infra are really hard -> Network topology issues are rare but hard
|
||||
* YAML indentation -> Tool support is needed for visualization
|
||||
* YAML validation -> Just use validation in dev and GitOps
|
||||
* YAML Cleanup -> Normalize YAML (order, anchors, etc.) for easy diff
|
||||
* Inadequate security analysis (too verbose, non-issues are warnings) -> Real-time insights (and during dev)
|
||||
* Crash Loop -> Identify stuck containers, simple debug containers
|
||||
* CLI vs GUI -> Enable eperience level oriented gui, Enhance intime troubleshooting
|
||||
* CLI vs GUI -> Enable experience level oriented GUI, Enhance in-time troubleshooting
|
||||
|
||||
## General issues
|
||||
|
||||
* No direct fs access
|
||||
* Multiple kubeconfigs
|
||||
* SaaS is sometimes only provided on kube, which sounds like complexity
|
||||
* Where do i begin my troubleshooting
|
||||
* Where do I begin my troubleshooting
|
||||
* Interoperability/Fragility with updates
|
||||
|
|
|
@ -6,11 +6,11 @@ tags:
|
|||
- network
|
||||
---
|
||||
|
||||
Global field CTO at Solo.io with a hint of servicemesh background.
|
||||
Global field CTO at Solo.io with a hint of service mesh background.
|
||||
|
||||
## History
|
||||
|
||||
* LinkerD 1.X was the first moder servicemesh and basicly a opt-in serviceproxy
|
||||
* LinkerD 1.X was the first modern service mesh and basically an opt-in service proxy
|
||||
* Challenges: JVM (size), latencies, ...
|
||||
|
||||
### Why not node-proxy?
|
||||
|
@ -23,8 +23,8 @@ Global field CTO at Solo.io with a hint of servicemesh background.
|
|||
### Why sidecar?
|
||||
|
||||
* Transparent (ish)
|
||||
* PArt of app lifecycle (up/down)
|
||||
* Single tennant
|
||||
* Part of app lifecycle (up/down)
|
||||
* Single tenant
|
||||
* No noisy neighbor
|
||||
|
||||
### Sidecar drawbacks
|
||||
|
@ -46,7 +46,7 @@ Global field CTO at Solo.io with a hint of servicemesh background.
|
|||
|
||||
* Full transparency
|
||||
* Optimized networking
|
||||
* Lower ressource allocation
|
||||
* Lower resource allocation
|
||||
* No race conditions
|
||||
* No manual pod injection
|
||||
* No credentials in the app
|
||||
|
@ -68,12 +68,12 @@ Global field CTO at Solo.io with a hint of servicemesh background.
|
|||
* Kubeproxy replacement
|
||||
* Ingress (via Gateway API)
|
||||
* Mutual Authentication
|
||||
* Specialiced CiliumNetworkPolicy
|
||||
* Configure Envoy throgh Cilium
|
||||
* Specialized CiliumNetworkPolicy
|
||||
* Configure Envoy through Cilium
|
||||
|
||||
### Control Plane
|
||||
|
||||
* Cilium-Agent on each node that reacts to scheduled workloads by programming the local dataplane
|
||||
* Cilium-Agent on each node that reacts to scheduled workloads by programming the local data-plane
|
||||
* API via Gateway API and CiliumNetworkPolicy
|
||||
|
||||
```mermaid
|
||||
|
@ -98,29 +98,29 @@ flowchart TD
|
|||
### Data plane
|
||||
|
||||
* Configured by control plane
|
||||
* Does all of the eBPF things in L4
|
||||
* Does all of the envoy things in L7
|
||||
* In-Kernel Wireguard for optional transparent encryption
|
||||
* Does all the eBPF things in L4
|
||||
* Does all the envoy things in L7
|
||||
* In-Kernel WireGuard for optional transparent encryption
|
||||
|
||||
### mTLS
|
||||
|
||||
* Network Policies get applied at the eBPF layer (check if id a can talk to id 2)
|
||||
* When mTLS is enabled there is a auth check in advance -> It it fails, proceed with agents
|
||||
* Agents talk to each other for mTLS Auth and save the result to a cache -> Now ebpf can say yes
|
||||
* Problems: The caches can lead to id confusion
|
||||
* Network Policies get applied at the eBPF layer (check if ID a can talk to ID 2)
|
||||
* When mTLS is enabled there is an auth check in advance -> If it fails, proceed with agents
|
||||
* Talk to each other for mTLS Auth and save the result to a cache -> Now eBPF can say yes
|
||||
* Problems: The caches can lead to ID confusion
|
||||
|
||||
## Istio
|
||||
|
||||
### Basiscs
|
||||
### Basics
|
||||
|
||||
* L4/7 Service mesh without it's own CNI
|
||||
* L4/7 Service mesh without its own CNI
|
||||
* Based on envoy
|
||||
* mTLS
|
||||
* Classicly via sidecar, nowadays
|
||||
* Classically via sidecar, nowadays
|
||||
|
||||
### Ambient mode
|
||||
|
||||
* Seperate L4 and L7 -> Can run on cilium
|
||||
* Separate L4 and L7 -> Can run on cilium
|
||||
* mTLS
|
||||
* Gateway API
|
||||
|
||||
|
@ -143,14 +143,14 @@ flowchart TD
|
|||
```
|
||||
|
||||
* Central xDS Control Plane
|
||||
* Per-Node Dataplane that reads updates from Control Plane
|
||||
* Per-Node Data-plane that reads updates from Control Plane
|
||||
|
||||
### Data Plane
|
||||
|
||||
* L4 runs via zTunnel Daemonset that handels mTLS
|
||||
* The zTunnel traffic get's handed over to the CNI
|
||||
* L7 Proxy lives somewhere™ and traffic get's routed through it as an "extra hop" aka waypoint
|
||||
* L4 runs via zTunnel Daemonset that handles mTLS
|
||||
* The zTunnel traffic gets handed over to the CNI
|
||||
* L7 Proxy lives somewhere™ and traffic gets routed through it as an "extra hop" aka waypoint
|
||||
|
||||
### mTLS
|
||||
|
||||
* The zTunnel creates a HBONE (http overlay network) tunnel with mTLS
|
||||
* The zTunnel creates a HBONE (HTTP overlay network) tunnel with mTLS
|
||||
|
|
|
@ -8,17 +8,17 @@ Who have I talked to today, are there any follow-ups or learnings?
|
|||
## Operator Framework
|
||||
|
||||
* We talked about the operator lifecycle manager
|
||||
* They shared the roadmap and the new release 1.0 will bring support for Operator Bundle loading from any oci source (no more public-registry enforcement)
|
||||
* They shared the roadmap and the new release 1.0 will bring support for Operator Bundle loading from any OCI source (no more public-registry enforcement)
|
||||
|
||||
## Flux
|
||||
|
||||
* We talked about automatic helm release updates [lessons learned from flux](/lessons_learned/02_flux)
|
||||
|
||||
## Cloudfoundry/Paketo
|
||||
## Cloud foundry/Paketo
|
||||
|
||||
* We mostly had some smalltalk
|
||||
* There will be a cloudfoundry day in Karlsruhe in October, they'd be happy to have us ther
|
||||
* The whole KORFI (Cloudfoundry on Kubernetes) Project is still going strong, but no release canidate yet (or in the near future)
|
||||
* There will be a cloud foundry day in Karlsruhe in October, they'd be happy to have us there
|
||||
* The whole KORFI (Cloud foundry on Kubernetes) Project is still going strong, but no release candidate yet (or in the near future)
|
||||
|
||||
## Traefik
|
||||
|
||||
|
@ -31,7 +31,7 @@ They will follow up
|
|||
## Postman
|
||||
|
||||
* I asked them about their new cloud-only stuff: They will keep their direction
|
||||
* The are also planning to work on info materials on why postman SaaS is not a big security risk
|
||||
* They are also planning to work on info materials on why postman SaaS is not a big security risk
|
||||
|
||||
## Mattermost
|
||||
|
||||
|
@ -39,9 +39,9 @@ They will follow up
|
|||
I should follow up
|
||||
{{% /notice %}}
|
||||
|
||||
* I talked about our problems with the mattermost operator and was asked to get back to them with the errors
|
||||
* They're currently migrating the mattermost cloud offering to arm - therefor arm support will be coming in the next months
|
||||
* The mattermost guy had exactly the same problems with notifications and read/unread using element
|
||||
* I talked about our problems with the Mattermost operator and was asked to get back to them with the errors
|
||||
* They're currently migrating the Mattermost cloud offering to arm - therefor arm support will be coming in the next months
|
||||
* The Mattermost guy had exactly the same problems with notifications and read/unread using element
|
||||
|
||||
## Vercel
|
||||
|
||||
|
@ -53,7 +53,7 @@ I should follow up
|
|||
* The paid renovate offering now includes build failure estimation
|
||||
* I was told not to buy it after telling the technical guy that we just use build pipelines as MR verification
|
||||
|
||||
### Certmanager
|
||||
### Cert manager
|
||||
|
||||
* The best swag (judged by coolness points)
|
||||
|
||||
|
@ -63,11 +63,11 @@ I should follow up
|
|||
They will follow up with a quick demo
|
||||
{{% /notice %}}
|
||||
|
||||
* A kubernetes security/runtime security solution with pretty nice looking urgency filters
|
||||
* A Kubernetes security/runtime security solution with pretty nice looking urgency filters
|
||||
* Includes eBPF to see what code actually runs
|
||||
* I'll witness a demo in early/mid april
|
||||
* I'll witness a demo in early/mid April
|
||||
|
||||
### Isovalent
|
||||
|
||||
* Dinner (very tasty)
|
||||
* Cilium still sounds like the way to go in regards to CNIs
|
||||
* Cilium still sounds like the way to go in regard to CNIs
|
||||
|
|
|
@ -5,7 +5,7 @@ weight: 2
|
|||
---
|
||||
|
||||
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
|
||||
This is where all of the people joined (over 2000)
|
||||
This is where all the people joined (over 12000)
|
||||
|
||||
The opening keynotes were a mix of talks and panel discussions.
|
||||
The main topic was - who could have guessed - AI and ML.
|
||||
|
|
|
@ -11,8 +11,8 @@ A talk by Google and Ivanti.
|
|||
|
||||
## Background
|
||||
|
||||
* RBAC is ther to limit information access and control
|
||||
* RBAC can be used to avoid interfearance in shared envs
|
||||
* RBAC is there to limit information access and control
|
||||
* RBAC can be used to avoid interference in shared envs
|
||||
* DNS is not really applicable when it comes to RBAC
|
||||
|
||||
### DNS in Kubernetes
|
||||
|
@ -26,11 +26,11 @@ A talk by Google and Ivanti.
|
|||
|
||||
* Specially for smaller, high growth companies with infinite VC money
|
||||
* Just give everyone their own cluster -> Problem solved
|
||||
* Smaller (<1000) typicly use many small clusters
|
||||
* Smaller (<1000) typically use many small clusters
|
||||
|
||||
### Shared Clusters
|
||||
|
||||
* Becomes imporetant when cost is a question and engineers don't have any platform knowledge
|
||||
* Becomes important when cost is a question and engineers don't have any platform knowledge
|
||||
* A dedicated kube team can optimize both hardware and deliver updates fast -> Increased productivity by utilizing specialists
|
||||
* Problem: Noisy neighbors by leaky DNS
|
||||
|
||||
|
@ -45,14 +45,14 @@ A talk by Google and Ivanti.
|
|||
### Leak mechanics
|
||||
|
||||
* Leaks are based on the `<service>.<nemspace>.<svc>.cluster.local` pattern
|
||||
* You can also just reverse looku the entire service CIDR
|
||||
* You can also just reverse lookup the entire service CIDR
|
||||
* SRV records get created for each service including the service ports
|
||||
|
||||
## Fix the leak
|
||||
|
||||
### CoreDNS Firewall Plugin
|
||||
|
||||
* External plugin provided by the coredns team
|
||||
* External plugin provided by the CoreDNS team
|
||||
* Expression engine built-in with support for external policy engines
|
||||
|
||||
```mermaid
|
||||
|
@ -67,19 +67,19 @@ flowchart LR
|
|||
|
||||
### Demo
|
||||
|
||||
* Firwall rule that only allows queries from the same namespace, kube-system or default
|
||||
* Firewall rule that only allows queries from the same namespace, `kube-system` or `default`
|
||||
* Every other cross-namespace request gets blocked
|
||||
* Same SVC requests from before now return NXDOMAIN
|
||||
* Same SVC requests from before now return `NXDOMAIN`
|
||||
|
||||
### Why is this a plugin and not default?
|
||||
|
||||
* Requires `pods verified` mode -> Puts the watch on pods and only returns a query result if the pod actually exists
|
||||
* Puts a watch on all pods -> higher API load and coredns mem usage
|
||||
* Puts a watch on all pods -> higher API load and CoreDNS memory usage
|
||||
* Potential race conditions with initial lookups in larger clusters -> Alternative is to fail open (not really secure)
|
||||
|
||||
### Per tenant DNS
|
||||
|
||||
* Just run a cporedns instance for each tenant
|
||||
* Use a mutating webhook to inject the right dns into each pod
|
||||
* Just run a CoreDNS instance for each tenant
|
||||
* Use a mutating webhook to inject the right DNS into each pod
|
||||
* Pro: No more pods verified -> Aka no more constant watch
|
||||
* Limitation: Platform services still need a central coredns
|
||||
* Limitation: Platform services still need a central CoreDNS
|
||||
|
|
|
@ -6,7 +6,7 @@ tags:
|
|||
- dx
|
||||
---
|
||||
|
||||
Mitch from aviatrix -a former software engineer who has now switched over to product managment.
|
||||
Mitch from aviatrix -a former software engineer who has now switched over to product management.
|
||||
|
||||
## Opening Thesis
|
||||
|
||||
|
@ -14,19 +14,19 @@ Opening with the Atari 2600 E.T. game as very bad fit sample.
|
|||
Thesis: Missing user empathy
|
||||
|
||||
* A very hard game aimed at children without the will to trail and error
|
||||
* Other aspect: Some of the devalopers were pulled together from throughout the company -> No passion needed
|
||||
* Other aspect: Some devalopers were pulled together from throughout the company -> No passion needed
|
||||
|
||||
### Another sample
|
||||
|
||||
* Idea: SCADA system with sensors that can be moved and the current location get's tracked via iPad.
|
||||
* Result: Nobody used the iPad app - only the desktop webapp
|
||||
* Problem: Sensor get's moved, location not updated, the measurements for the wrong location get reported until update
|
||||
* Idea: SCADA system with sensors that can be moved, and the current location gets tracked via iPad.
|
||||
* Result: Nobody used the iPad app - only the desktop Web-app
|
||||
* Problem: Sensor gets moved, location not updated, the measurements for the wrong location get reported until update
|
||||
* Source: Moving a sensor is a pretty involved process including high pressure aka no priority for iPad
|
||||
* Empathy loss: Different working endvironments result in drastic work experience missmatch
|
||||
* Empathy loss: Different working environments result in drastic work experience mismatch
|
||||
|
||||
## The source
|
||||
|
||||
* Idea: A software engineer writes software, that someone else has to use, not themselfes
|
||||
* Idea: A software engineer writes software, that someone else has to use, not themselves
|
||||
* Problem: Distance between user and dev is high and their perspectives differ heavily
|
||||
|
||||
## User empathy
|
||||
|
@ -37,43 +37,43 @@ Thesis: Missing user empathy
|
|||
## Stories from Istio
|
||||
|
||||
* Classic implementation: Sidecar Proxy
|
||||
* Question: Can the same value be provided without a sidecar anywhers
|
||||
* Question: Can the same value be provided without a sidecar anywhere
|
||||
* Answer: Ambient mode -> split into l4 (proxy per node) and l7 (no sharing)
|
||||
* Problem: After alpha release ther was a lack of exitement and feedback
|
||||
* Problem: After alpha release there was a lack of excitement and feedback
|
||||
* Result: Twitter Space event for feedback
|
||||
|
||||
### Ideas and feedback
|
||||
|
||||
* Idea: Sidecar is somewhat magical
|
||||
* Feedback: Sidecars are a pain, but after integrating istio can be automated -> a problem gets solved, that already had a solution
|
||||
* Feedback: Sidecars are a pain, but after integrating Istio can be automated -> a problem gets solved, that already had a solution
|
||||
* Result: Highly overvalued the pain of sidecars
|
||||
* Idea: Building istio into a platform sounds easy
|
||||
* Idea: Building Istio into a platform sounds easy
|
||||
* Feedback: The platform has to be changed for the new ambient mode -> High time investment while engineers are hard
|
||||
* Result: The cost of platform changes was highly undervalued
|
||||
* Idea: Sidecar compute sound expensive and networking itself pretty cheap
|
||||
* Feedback: Many users have multi-region clusters -> Egress is whery expoenive
|
||||
* Feedback: Many users have multi-region clusters -> Egress is very expensive
|
||||
* Result: The relation between compute and egress cost was pretty much swapped
|
||||
|
||||
### What now?
|
||||
|
||||
* Ambient is the right solution for new users (fresh service mesehes)
|
||||
* Existing users probaly won't upgrade
|
||||
* Result: They will move forward with ambient mdoe
|
||||
* Ambient is the right solution for new users (fresh service meshes)
|
||||
* Existing users probably won't upgrade
|
||||
* Result: They will move forward with ambient mode
|
||||
|
||||
## So what did we lern
|
||||
## So what did we learn
|
||||
|
||||
### Basic questions
|
||||
|
||||
* Who are my intended users?
|
||||
* What exites/worries them?
|
||||
* What excites/worries them?
|
||||
* What do they find easy/hard?
|
||||
* What is ther biggest expense and what is inexpensive?
|
||||
* What is the biggest expense and what is inexpensive?
|
||||
|
||||
### How to get better empathy
|
||||
|
||||
1. Shared perspective comes from proximity
|
||||
1. Where they are
|
||||
2. What they do -> Dogfood everything related to the platform (not just your own products)
|
||||
2. What they do -> Dog food everything related to the platform (not just your own products)
|
||||
2. Never stop listening
|
||||
1. Even if you love your product
|
||||
2. Especially if you love your product
|
||||
|
@ -81,4 +81,4 @@ Thesis: Missing user empathy
|
|||
|
||||
### Takeaways
|
||||
|
||||
* Don't ship a puzzlebox (landscape) but a picture (this integrates with this and that)
|
||||
* Don't ship a puzzle box (landscape) but a picture (this integrates with this and that)
|
||||
|
|
|
@ -6,25 +6,25 @@ tags:
|
|||
- business
|
||||
---
|
||||
|
||||
Bob a Program Manager at Google and Kubernetes steering commitee member with a bunch of contributor and maintainer experience.
|
||||
Bob a Program Manager at Google and Kubernetes steering committee member with a bunch of contributor and maintainer experience.
|
||||
The value should be rated even higher than the pure business value.
|
||||
|
||||
## Baseline
|
||||
|
||||
* A öarge chunk of CNCF contrinbutors and maintainers (95%) are company affiliated
|
||||
* Most (50%) of the people contributed in professional an personal time )(and 30 only on work time)
|
||||
* A large chunk of CNCF contributors and maintainers (95%) are company affiliated
|
||||
* Most (50%) of the people contributed in professional personal time (and 30 only on work time)
|
||||
* Explaining business value can be very complex
|
||||
* Base question: What does this contribute to the business
|
||||
|
||||
## Data enablement
|
||||
|
||||
* Problem: Insufficient data (data collection is often an afterthought)
|
||||
* Example used: Random CNCF slection
|
||||
* 50% of issues are labed consistentöy
|
||||
* Example used: Random CNCF selection
|
||||
* 50% of issues are labeled consistently
|
||||
* 17% of projects label PRs
|
||||
* 58% of projects use milestones
|
||||
* Labels provide: Context, Prioritization, Scope, State
|
||||
* Milestones enable: Filtering outside of daterange
|
||||
* Milestones enable: Filtering outside date range
|
||||
* Sample queries:
|
||||
* How many features have been in milestone XY?
|
||||
* How many bugs have been fixed in this version?
|
||||
|
@ -37,36 +37,36 @@ The value should be rated even higher than the pure business value.
|
|||
* Thought of as overhead
|
||||
* Project is too small
|
||||
* Tools:
|
||||
* Actions/Pipelines for autolabel, copy label sync labels
|
||||
* Prow: The label system for kubernetes projects
|
||||
* People with high project but low code knowlege can triage -> Make them feel recognized
|
||||
* Actions/Pipelines for auto-label, copy label sync labels
|
||||
* Prow: The label system for Kubernetes projects
|
||||
* People with high project, but low code knowledge can triage -> Make them feel recognized
|
||||
|
||||
### Conclusions
|
||||
|
||||
* Consistent labels & milestones are critical for state analysis
|
||||
* Data is the evidence needed in messaging for leadershiü
|
||||
* Recruting triage-specific people and using automations streamlines the process
|
||||
* Data is the evidence needed in messaging for leadership
|
||||
* Recruiting triage-specific people and using automations streamlines the process
|
||||
|
||||
## Communication
|
||||
|
||||
### Personas
|
||||
|
||||
* OSS enthusiast: Knows the ecosystem and project with a knack for discussions and deep dives
|
||||
* Maintainer;: A enthusiast that is tired, unter pressure and most of the time a one-man show that would prefer doint thechnical stuff
|
||||
* CXO: Focus on ressources, health, ROI
|
||||
* Product manager: Get the best project, user friendly
|
||||
* Leads: Employees should meet KPIs, with slightly better techn understanding
|
||||
* Maintainer;: A enthusiast that is tired, under pressure and most of the time a one-man show that would prefer doing technical stuff
|
||||
* CXO: Focus on resources, health, ROI
|
||||
* Product manager: Get the best project, user-friendly
|
||||
* Leads: Employees should meet KPIs, with slightly better tech understanding
|
||||
* End user: How can tools/features help me
|
||||
|
||||
### Growth limits
|
||||
|
||||
* Main questions:
|
||||
* What is theis project/feature
|
||||
* What is this project/feature
|
||||
* Where is the roadmap
|
||||
* What parts of the project are at risk?
|
||||
* Problem: Wording
|
||||
|
||||
### Ways of surfcing information
|
||||
### Ways of surfacing information
|
||||
|
||||
* Regular project reports/blog posts
|
||||
* Roadmap on website
|
||||
|
@ -76,8 +76,8 @@ The value should be rated even higher than the pure business value.
|
|||
|
||||
* What are we getting out? (How fast are bugs getting fixed)
|
||||
* What is the criticality of the project?
|
||||
* How much time is spent on maintainance?
|
||||
* How much time is spent on maintenance?
|
||||
|
||||
## Conclusion
|
||||
|
||||
* Ther is significant unrealized valze in open source
|
||||
* There is significant unrealized value in open source
|
||||
|
|
|
@ -10,7 +10,7 @@ A talk about the backstage documentation audit and what makes a good documentati
|
|||
|
||||
## Opening
|
||||
|
||||
* 2012 the year of the mayan calendar and the mainstream success of memes
|
||||
* 2012 the year of the Maya calendar and the mainstream success of memes
|
||||
* The classic meme RTFM -> Classic manuals were pretty long
|
||||
* 2024: Manuals have become documentation (hopefully with better contents)
|
||||
|
||||
|
@ -18,9 +18,9 @@ A talk about the backstage documentation audit and what makes a good documentati
|
|||
|
||||
### What is documentation
|
||||
|
||||
* Docs (the raw descriptions, qucikstart and how-to)
|
||||
* Website (the first impression - what does this do, why would i need it)
|
||||
* REAMDE (the github way of website + docs)
|
||||
* Docs (the raw descriptions, quick-start and how-to)
|
||||
* Website (the first impression - what does this do, why would I need it)
|
||||
* README (the GitHub way of website + docs)
|
||||
* CONTRIBUTING (Is this a one-man show)
|
||||
* Issues
|
||||
* Meta docs (how do we orchestrate things)
|
||||
|
@ -30,10 +30,10 @@ A talk about the backstage documentation audit and what makes a good documentati
|
|||
* Who needs this documentation?
|
||||
* New users -> Optimize for minimum context
|
||||
* Experienced users
|
||||
* User roles (Admins, end users, ...) -> Seperate into different pages (Get started based in your role)
|
||||
* User roles (Admins, end users, ...) -> Separate into different pages (Get started based in your role)
|
||||
* What do we need to enable with this documentation?
|
||||
* Prove value fast -> Why this project?
|
||||
* Educate on fundemental aspects
|
||||
* Educate on fundamental aspects
|
||||
* Showcase features/uses cases
|
||||
* Hands-on enablement -> Tutorials, guides, step-by-step
|
||||
|
||||
|
@ -43,24 +43,24 @@ A talk about the backstage documentation audit and what makes a good documentati
|
|||
* Documented scheduled contributor meetings
|
||||
* Getting started guides for new contributors
|
||||
* Project governance
|
||||
* Who is gonna own it?
|
||||
* Who is going to own it?
|
||||
* What will happen to my PR?
|
||||
* Who maintains features?
|
||||
|
||||
### Website
|
||||
|
||||
* Single source for all pages (one repo that includes landing, docs, api and so on) -> Easier to contribute
|
||||
* Single source for all pages (one repo that includes landing, docs, API and so on) -> Easier to contribute
|
||||
* Usability (especially on mobile)
|
||||
* Social proof and case studies -> Develop trust
|
||||
* SEO (actually get found) and analytics (detect how documentation is used and where people leave)
|
||||
* Plan website maintenance
|
||||
|
||||
### What is great documetnation
|
||||
### What is great documentation
|
||||
|
||||
* Project docs helps users according to their needs -> Low question to answer latency
|
||||
* Contributor docs enables contributions in a predictable manner -> Don't leave "when will this be reviewed/mered" questions open
|
||||
* Website proves why anyone should invest time in this projects?
|
||||
* All documetnation is connected and up to date
|
||||
* Project docs help users according to their needs -> Low question to answer latency
|
||||
* Contributor docs enables contributions predictably -> Don't leave "when will this be reviewed/merged" questions open
|
||||
* Website proves why anyone should invest time in these projects?
|
||||
* All documentation is connected and up to date
|
||||
|
||||
## General best practices
|
||||
|
||||
|
@ -72,11 +72,11 @@ A talk about the backstage documentation audit and what makes a good documentati
|
|||
|
||||
## Examples
|
||||
|
||||
* Opentelemetry: Split by role (dev, ops)
|
||||
* OpenTelemetry: Split by role (dev, ops)
|
||||
* Prometheus:
|
||||
* New user conent in intro (concept) and getting started (practice)
|
||||
* Hierarchie includes concepts, key features and guides/tutorials
|
||||
* New user content in intro (concept) and getting started (practice)
|
||||
* Hierarchies includes concepts, key features and guides/tutorials
|
||||
|
||||
## Q&A
|
||||
|
||||
* Every last wednesday in the month is a cncf echnical writers meetin (cncf slack -> techdocs)
|
||||
* Every last Wednesday in the month is a CNCF technical writers meeting (CNCF slack -> `#techdocs`)
|
||||
|
|
|
@ -9,11 +9,11 @@ tags:
|
|||
A talk by Broadcom and Bloomberg (both related to buildpacks.io).
|
||||
And a very full talk at that.
|
||||
|
||||
## Baselinbe
|
||||
## Baseline
|
||||
|
||||
* CN Buildpack provides the spec for buildpacks with a couple of different implementations
|
||||
* Pack CLI with builder (collection of buildopacks - for example ppaketo or heroku)
|
||||
* Output images follow oci -> Just run them on docker/podman/kubernetes
|
||||
* Pack CLI with builder (collection of Buildpacks - for example Paketo or Heroku)
|
||||
* Output images follow OCI -> Just run them on docker/Podman/Kubernetes
|
||||
* Built images are `production application images` (small attack surface, SBOM, non-root, reproducible)
|
||||
|
||||
## Scaling
|
||||
|
@ -47,7 +47,7 @@ flowchart LR
|
|||
|
||||
* Goal: Just a simple docker full that auto-detects the right architecture
|
||||
* Needed: Pack, Lifecycle, Buildpacks, Build images, builders, registry
|
||||
* Current state: There is an RFC to handle image index creation with changes to buildpack creation
|
||||
* Current state: There is an RFC to handle image index creation with changes to Buildpack creation
|
||||
* New folder structure for binaries
|
||||
* Update config files to include targets
|
||||
* The user impact is minimal, because the builder abstracts everything away
|
||||
|
@ -56,5 +56,5 @@ flowchart LR
|
|||
|
||||
* kpack is slsa.dev v3 compliant (party hard)
|
||||
* 5 years of production
|
||||
* scaling up to tanzu/heroku/gcp levels
|
||||
* scaling up to Tanzu/Heroku/GCP levels
|
||||
* Multiarch is being worked on
|
||||
|
|
|
@ -4,4 +4,4 @@ title: Day 3
|
|||
weight: 3
|
||||
---
|
||||
|
||||
Spent most of the early day with headache therefor talk notes only start at noon.
|
||||
Spent most of the early day with headache therefore talk notes only start at noon.
|
||||
|
|
|
@ -9,11 +9,11 @@ tags:
|
|||
## Problems
|
||||
|
||||
* Dockerfiles are hard and not 100% reproducible
|
||||
* Buildpoacks are reproducible but result in large single-arch images
|
||||
* Buildpacks are reproducible but result in large single-arch images
|
||||
* Nix has multiple ways of doing things
|
||||
|
||||
## Solutions
|
||||
|
||||
* Degger as a CI solution
|
||||
* Multistage docker images with distroless -> Small image, small attack surcface
|
||||
* Language specific solutions (ki, jib)
|
||||
* Dagger as a CI solution
|
||||
* Multistage docker images with distroless -> Small image, small attack surface
|
||||
* Language specific solutions (`ki`, `jib`)
|
||||
|
|
|
@ -5,12 +5,12 @@ tags:
|
|||
- ebpf
|
||||
---
|
||||
|
||||
A talk by isovalent with a full room (one of the large ones).
|
||||
A talk by Isovalent with a full room (one of the large ones).
|
||||
|
||||
## Baseline
|
||||
|
||||
* eBPF lets you run custom code in the kernel -> close to hardware
|
||||
* Typical usecases: Networking, Observability, Tracing/Profiling, security
|
||||
* Typical use cases: Networking, Observability, Tracing/Profiling, security
|
||||
* Question: Is eBPF truing complete and can it be used for more complex scenarios (TLS, LK7)?
|
||||
|
||||
## eBPF verifier
|
||||
|
@ -19,9 +19,9 @@ A talk by isovalent with a full room (one of the large ones).
|
|||
* Principles
|
||||
* Read memory only with correct permissions
|
||||
* All writes to valid and safe memory
|
||||
* Valid in-bounds and well formed control flow
|
||||
* Execution on-cpu time is bounded: sleep, scheduled callbacks, interations, program acutally compketes
|
||||
* Aquire/release and reference count semantics
|
||||
* Valid in-bounds and well-formed control flow
|
||||
* Execution on CPU time is bounded: sleep, scheduled callbacks, iterations, program actually completes
|
||||
* Acquire/release and reference count semantics
|
||||
|
||||
## Demo: Game of life
|
||||
|
||||
|
@ -34,7 +34,7 @@ A talk by isovalent with a full room (one of the large ones).
|
|||
|
||||
* Instruction limit to let the verifier actually verify the program in reasonable time
|
||||
* Limit is based on: Instruction limit and verifier step limit
|
||||
* nowadays the limit it 4096 unprivileged calls and 1 million privileged istructions
|
||||
* nowadays the limit it 4096 unprivileged calls and 1 million privileged instructions
|
||||
* Only jump forward -> No loops
|
||||
* Is a basic limitation to ensure no infinite loops can ruin the day
|
||||
* Limitation: Only finite iterations can be performed
|
||||
|
@ -43,14 +43,14 @@ A talk by isovalent with a full room (one of the large ones).
|
|||
* Solution: subprogram (aka function) and the limit is only for each function -> `x*subprogramms = x*limit`
|
||||
* Limit: Needs real skill
|
||||
* Programs have to terminate
|
||||
* Well eBPF really only wants to release the cpu, the program doesn't have to end per se
|
||||
* Iterator: walk abitrary lists of objects
|
||||
* Sleep on pagefault or other memory operations
|
||||
* Well eBPF really only wants to release the CPU, the program doesn't have to end per se
|
||||
* Iterator: walk arbitrary lists of objects
|
||||
* Sleep on page fault or other memory operations
|
||||
* Timer callbacks (including the timer 0 for run me asap)
|
||||
* Memory allocation
|
||||
* Maps are used as the memory management system
|
||||
|
||||
## Result
|
||||
|
||||
* You can execure abitrary tasks via eBPF
|
||||
* You can execute arbitrary tasks via eBPF
|
||||
* It can be used for HTTP or TLS - it's just not implemented yet™
|
||||
|
|
|
@ -7,20 +7,20 @@ tags:
|
|||
- scaling
|
||||
---
|
||||
|
||||
By the nice opertor framework guys at IBM and RedHat.
|
||||
By the nice operator framework guys at IBM and Red Hat.
|
||||
I'll skip the baseline introduction of what an operator is.
|
||||
|
||||
## Operator DSK
|
||||
|
||||
> Build the operator
|
||||
|
||||
* Kubebuilder with v4 Plugines -> Supports the latest Kubernetes
|
||||
* Java Operator SDK is not a part of Operator SDK and they released 5.0.0
|
||||
* Kubebuilder with v4 Plugins -> Supports the latest Kubernetes
|
||||
* Java Operator SDK is not a part of Operator SDK, and they released 5.0.0
|
||||
* Now with server side apply in the background
|
||||
* Better status updates and finalizer handling
|
||||
* Dependent ressource handling (alongside optional dependent ressources)
|
||||
* Dependent resource handling (alongside optional dependent resources)
|
||||
|
||||
## Operator Liefecycle Manager
|
||||
## Operator Lifecycle Manager
|
||||
|
||||
> Manage the operator -> A operator for installing operators
|
||||
|
||||
|
@ -28,16 +28,16 @@ I'll skip the baseline introduction of what an operator is.
|
|||
|
||||
* New API Set -> The old CRDs were overwhelming
|
||||
* More GitOps friendly with per-tenant support
|
||||
* Prediscribes update paths (maybe upgrade)
|
||||
* Suport for operator bundels as k8s manifests/helmchart
|
||||
* Prescribes update paths (maybe upgrade)
|
||||
* Support for operator bundles as k8s manifests/helm chart
|
||||
|
||||
### OLM v1 Components
|
||||
|
||||
* Cluster Extension (User-Facing API)
|
||||
* Defines the app you want to install
|
||||
* Resolvs requirements through catalogd/depply
|
||||
* Catalogd (Catalog Server/Operator)
|
||||
* Depply (Dependency/Contraint solver)
|
||||
* Resolves requirements through CatalogD/depply
|
||||
* CatalogD (Catalog Server/Operator)
|
||||
* Depply (Dependency/Constraint solver)
|
||||
* Applier (Rukoak/kapp compatible)
|
||||
|
||||
```mermaid
|
||||
|
|
|
@ -7,20 +7,20 @@ tags:
|
|||
- security
|
||||
---
|
||||
|
||||
A talk by the certmanager maintainers that also staffed the certmanager booth.
|
||||
Humor is present, but the main focus is still thetechnical integration
|
||||
A talk by the cert manager maintainers that also staffed the cert manager booth.
|
||||
Humor is present, but the main focus is still the technical integration
|
||||
|
||||
## Baseline
|
||||
|
||||
* Certmanager is the best™ way of getting certificats
|
||||
* Poster features: Autorenewal, ACME, PKI, HC Vault
|
||||
* Cert manager is the best™ way of getting certificates
|
||||
* Poster features: Auto-renewal, ACME, PKI, HC Vault
|
||||
* Numbers: 20M downloads 427 contributors 11.3 GitHub stars
|
||||
* Currently on the gratuation path
|
||||
* Currently on the graduation path
|
||||
|
||||
## History
|
||||
|
||||
* 2016: Jetstack created kube-lego -> A operator that generated LE certificates for ingress based on annotations
|
||||
* 2o17: Certmanager launch -> Cert ressources and issuer ressources
|
||||
* 2o17: Cert manager launch -> Cert resources and issuer resources
|
||||
* 2020: v1.0.0 and joined CNCF sandbox
|
||||
* 2022: CNCF incubating
|
||||
* 2024: Passed the CNCF security audit and on the way to graduation
|
||||
|
@ -30,17 +30,17 @@ Humor is present, but the main focus is still thetechnical integration
|
|||
### How it came to be
|
||||
|
||||
* The idea: Mix the digital certificate with the classical seal
|
||||
* Started as the stamping idea to celebrate v1 and send contributors a thank you with candels
|
||||
* Problems: Candels are not allowed -> Therefor glue gun
|
||||
* Started as the stamping idea to celebrate v1 and send contributors a thank you with candles
|
||||
* Problems: Candles are not allowed -> Therefor glue gun
|
||||
|
||||
### How it works
|
||||
|
||||
* Components
|
||||
* RASPI with k3s
|
||||
* Raspberry Pi with k3s
|
||||
* Printer
|
||||
* Certmanager
|
||||
* A go-based webui
|
||||
* QR-Code: Contains link to certificate with privatekey
|
||||
* Cert manager
|
||||
* A Go-based Web-UI
|
||||
* QR-Code: Contains link to certificate with private key
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
|
@ -53,14 +53,14 @@ flowchart LR
|
|||
### What is new this year
|
||||
|
||||
* Idea: Certs should be usable for TLS
|
||||
* Solution: The QR-Code links to a zip-download with the cert and provate key
|
||||
* Solution: The QR-Code links to a zip-download with the cert and private key
|
||||
* New: ECDSA for everything
|
||||
* New: A stable root ca with intermediate for every conference
|
||||
* New: Guestbook that can only be signed with a booth issued certificate -> Available via script
|
||||
|
||||
## Learnings
|
||||
|
||||
* This demo is just a private CA with certmanager -> Can be applied to any PKI-usecase
|
||||
* This demo is just a private CA with cert manager -> Can be applied to any PKI-usecases
|
||||
* The certificate can be created via the CR, CSI driver (create secret and mount in container), ingress annotations, ...
|
||||
* You can use multiple different Issuers (CA Issuer aka PKI, Let's Encrypt, Vault, AWS, ...)
|
||||
|
||||
|
@ -74,4 +74,4 @@ flowchart LR
|
|||
## Conclusion
|
||||
|
||||
* This is not just a demo -> Just apply it for machines
|
||||
* They have regular meetings (daily standups and bi-weekly)
|
||||
* They have regular meetings (daily stand-ups and bi-weekly)
|
||||
|
|
|
@ -7,14 +7,14 @@ tags:
|
|||
- scaling
|
||||
---
|
||||
|
||||
A talk by TikTok/ByteDace (duh) focussed on using central controllers instead of on the edge.
|
||||
A talk by TikTok/ByteDance (duh) focussed on using central controllers instead of on the edge.
|
||||
|
||||
## Background
|
||||
|
||||
> Global means non-china
|
||||
|
||||
* Edge platform team for cdn, livestreaming, uploads, realtime communication, etc.
|
||||
* Around 250 cluster with 10-600 nodes each - mostly non-cloud aka baremetal
|
||||
* Edge platform team for CDN, livestreaming, uploads, real-time communication, etc.
|
||||
* Around 250 cluster with 10-600 nodes each - mostly non-cloud aka bare-metal
|
||||
* Architecture: Control plane clusters (platform services) - data plane clusters (workload by other teams)
|
||||
* Platform includes logs, metrics, configs, secrets, ...
|
||||
|
||||
|
@ -24,28 +24,28 @@ A talk by TikTok/ByteDace (duh) focussed on using central controllers instead of
|
|||
|
||||
* Operators are essential for platform features
|
||||
* As the feature requests increase, more operators are needed
|
||||
* The deployment of operators throughout many clusters is complex (namespace, deployments, pollicies, ...)
|
||||
* The deployment of operators throughout many clusters is complex (namespace, deployments, policies, ...)
|
||||
|
||||
### Edge
|
||||
|
||||
* Limited ressources
|
||||
* Cost implication of platfor features
|
||||
* Limited resources
|
||||
* Cost implication of platform features
|
||||
* Real time processing demands by platform features
|
||||
* Balancing act between ressorces used by workload vs platform features (20-25%)
|
||||
* Balancing act between resources used by workload vs platform features (20-25%)
|
||||
|
||||
### The classic flow
|
||||
|
||||
1. New feature get's requested
|
||||
2. Use kube-buiders with the sdk to create the operator
|
||||
1. New feature gets requested
|
||||
2. Use kubebuider with the SDK to create the operator
|
||||
3. Create namespaces and configs in all clusters
|
||||
4. Deploy operator to all clsuters
|
||||
4. Deploy operator to all clusters
|
||||
|
||||
## Possible Solution
|
||||
|
||||
### Centralized Control Plane
|
||||
|
||||
* Problem: The controller implementation is limited to a cluster boundry
|
||||
* Idea: Why not create a signle operator that can manage multiple edge clusters
|
||||
* Problem: The controller implementation is limited to a cluster boundary
|
||||
* Idea: Why not create a single operator that can manage multiple edge clusters
|
||||
* Implementation: Just modify kubebuilder to accept multiple clients (and caches)
|
||||
* Result: It works -> Simpler deployment and troubleshooting
|
||||
* Concerns: High code complexity -> Long familiarization
|
||||
|
@ -54,14 +54,14 @@ A talk by TikTok/ByteDace (duh) focussed on using central controllers instead of
|
|||
### Attempt it a bit more like kubebuilder
|
||||
|
||||
* Each cluster has its own manager
|
||||
* There is a central multimanager that starts all of the cluster specific manager
|
||||
* There is a central multimanager that starts all the cluster specific manager
|
||||
* Controller registration to the manager now handles cluster names
|
||||
* The reconciler knows which cluster it is working on
|
||||
* The multi cluster management basicly just tets all of the cluster secrets and create a manager+controller for each cluster secret
|
||||
* Challenges: Network connectifiy
|
||||
* The multi cluster management basically just test all the cluster secrets and create a manager+controller for each cluster secret
|
||||
* Challenges: Network connectivity
|
||||
* Solutions:
|
||||
* Dynamic add/remove of clusters with go channels to prevent pod restarts
|
||||
* Connectivity health checks -> For loss the recreate manager get's triggered
|
||||
* Connectivity health checks -> For loss the `recreate manager` gets triggered
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
|
@ -80,7 +80,7 @@ flowchart LR
|
|||
|
||||
## Conclusion
|
||||
|
||||
* Acknowlege ressource contrains on edge
|
||||
* Acknowledge resource constraints on edge
|
||||
* Embrace open source adoption instead of build your own
|
||||
* Simplify deployment
|
||||
* Recognize your own optionated approach and it's use cases
|
||||
* Recognize your own opinionated approach and it's use cases
|
||||
|
|
|
@ -15,22 +15,22 @@ Notes may be a bit unstructured due to tired note taker.
|
|||
|
||||
## Basics
|
||||
|
||||
* Fluentbit is compatible with
|
||||
* prometheus (It can replace the prometheus scraper and node exporter)
|
||||
* openmetrics
|
||||
* opentelemetry (HTTPS input/output)
|
||||
* FluentBit is compatible with
|
||||
* Prometheus (It can replace the Prometheus scraper and node exporter)
|
||||
* OpenMetrics
|
||||
* OpenTelemetry (HTTPS input/output)
|
||||
* FluentBit can export to Prometheus, Splunk, InfluxDB or others
|
||||
* So pretty much it can be used to collect data from a bunch of sources and pipe it out to different backend destinations
|
||||
* Fluent ecosystem: No vendor lock-in to observability
|
||||
|
||||
### Arhitectures
|
||||
### Architectures
|
||||
|
||||
* The fluent agent collects data and can send it to one or multiple locations
|
||||
* FluentBit can be used for aggregation from other sources
|
||||
|
||||
### In the kubernetes logging ecosystem
|
||||
### In the Kubernetes logging ecosystem
|
||||
|
||||
* Pods logs to console -> Streamed stdout/err gets piped to file
|
||||
* Pod logs to console -> Streamed stdout/err gets piped to file
|
||||
* The logs in the file get encoded as JSON with metadata (date, channel)
|
||||
* Labels and annotations only live in the control plane -> You have to collect it additionally -> Expensive
|
||||
|
||||
|
@ -56,8 +56,8 @@ flowchart LR
|
|||
|
||||
### Solution
|
||||
|
||||
* Solution: Processor - a seperate thread segmented by telemetry type
|
||||
* Plugins can be written in your favourite language /c, rust, go, ...)
|
||||
* Solution: Processor - a separate thread segmented by telemetry type
|
||||
* Plugins can be written in your favorite language (c, rust, go, ...)
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
|
@ -74,7 +74,7 @@ flowchart LR
|
|||
### General new features in v3
|
||||
|
||||
* Native HTTP/2 support in core
|
||||
* Contetn modifier with multiple operations (insert, upsert, delete, rename, hash, extract, convert)
|
||||
* Content modifier with multiple operations (insert, upsert, delete, rename, hash, extract, convert)
|
||||
* Metrics selector (include or exclude metrics) with matcher (name, prefix, substring, regex)
|
||||
* SQL processor -> Use SQL expression for selections (instead of filters)
|
||||
* Better OpenTelemetry output
|
||||
|
|
|
@ -15,15 +15,15 @@ Who have I talked to today, are there any follow-ups or learnings?
|
|||
They will follow up with a quick demo
|
||||
{{% /notice %}}
|
||||
|
||||
* A interesting tektone-based CI/CD solutions that also integrates with oter platforms
|
||||
* May be interesting for either ODIT or some of our customers
|
||||
* An interesting tektone-based CI/CD solutions that also integrates with other platforms
|
||||
* May be interesting for either ODIT.Services or some of our customers
|
||||
|
||||
## Docker
|
||||
|
||||
* Talked to one salesperson just aboput the general conference
|
||||
* Talked to one technical guy about docker buildtime optimization
|
||||
* Talked to one salesperson just about the general conference
|
||||
* Talked to one technical guy about docker build time optimization
|
||||
|
||||
## Rancher/Suse
|
||||
## Rancher/SUSE
|
||||
|
||||
* I just got some swag, a friend of mine got a demo focussing on runtime security
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@ title: Operators
|
|||
|
||||
## Observability
|
||||
|
||||
* Export reconcile loop steps as opentelemetry traces
|
||||
* Export reconcile loop steps as OpenTelemetry traces
|
||||
|
||||
## Work queue
|
||||
|
||||
|
|
|
@ -3,11 +3,11 @@ title: Flux
|
|||
weight: 2
|
||||
---
|
||||
|
||||
Some lessonslearned from flux talsk and from talking to the flux team.
|
||||
Some lessons learned from flux talks and from talking to the flux team.
|
||||
|
||||
## Helm Autupdate
|
||||
## Helm Auto-update
|
||||
|
||||
* Currently you can just use the normal image autoupdate machanism
|
||||
* Requirement: The helm chart is stored as a OCI-Artifact
|
||||
* Currently, you can just use the normal image auto-update mechanism
|
||||
* Requirement: The helm chart is stored as an OCI-Artifact
|
||||
* How: Just create the usual CRs and annotations
|
||||
* They are also working on generalizing the autoupdate Process to fitt all OCI articacts (comming soon)
|
||||
* They are also working on generalizing the auto-update Process to fit all OCI artifacts (coming soon)
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: Check this out
|
||||
---
|
||||
|
||||
Just a loose list of stuff that souded interesting
|
||||
Just a loose list of stuff that sounded interesting
|
||||
|
||||
* Dapr
|
||||
* etcd backups
|
||||
|
|
|
@ -4,4 +4,4 @@ title: Lessons Learned
|
|||
weight: 99
|
||||
---
|
||||
|
||||
Interesting lessons learned + tipps/tricks.
|
||||
Interesting lessons learned + tips/tricks.
|
||||
|
|
Loading…
Reference in New Issue