day 2 keynotes

2024-03-20 10:42:54 +01:00 · 2024-03-20 10:42:54 +01:00 · 3eb8eacd1d
commit 3eb8eacd1d
parent a03f0c0da2
7 changed files with 209 additions and 0 deletions
--- a/content/day2/01_opening.md
+++ b/content/day2/01_opening.md
@ -0,0 +1,32 @@
+---
+title: Opening Keynote
+weight: 1
+---
+
+The opening keynote started - as is the tradition with keynotes - with an "motivational" opening video.
+The keynote itself was presented by the CEO of the CNCF.
+
+## The numbers
+
+* Over 2000 attendees
+* 10 Years of Kubernetes
+* 60% of large organizations expect rapid cost increases due to AI/ML (FinOps Survey)
+
+## The highlights
+
+* Everyone uses cloudnative
+* AI uses Kubernetes b/c the UX is way better than classic tools
+  * Especially when transferring from dev to prod
+  * We need standardization
+* Open source is cool
+
+## Live demo
+
+* KIND cluster on desktop
+* Protptype Stack (develop on client)
+  * Kubernetes with the LLM
+  * Host with LLVA (image describe model), moondream and OLLAMA (the model manager/registry()
+* Prod Stack (All in kube)
+  * Kubernetes with LLM, LLVA, OLLAMA, moondream
+* Available Models: llava, mistral bokllava (llava*mistral)
+* Host takes picture, ai describes what is pictures (in our case the conference audience)
--- a/content/day2/02_ai_keynote.md
+++ b/content/day2/02_ai_keynote.md
@ -0,0 +1,36 @@
+---
+title: AI KEynote discussion
+weight: 2
+---
+
+A podium discussion (somewhat scripted) lead by Pryanka
+
+## Guests
+
+* Tim from Mistral
+* Paige from Google AI
+* Jeff founder of OLLAMA
+
+## Discussion
+
+* What do you use as the base of dev for ollama
+  * Jeff: The concepts from docker, git, kubernetes
+* How is the balance between ai engi and ai ops
+  * Jeff: The classic dev vs ops devide, many ML-Engi don't think about
+  * Paige: Yessir
+* How does infra keep up with the fast research
+  * Paige: Well, they don't - but they do their best and Cloudnative is cool
+  * Jeff: Well we're not google, but kubernetes is the saviour
+* What are scaling constraints
+  * Jeff: Currently sizing of models is still in it's infancy
+  * Jeff: There will be more specific hardware and someone will have to support it
+  * Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization)
+  * Paige: Optimization of smaller models
+* What technologies need to be open source licensed
+  * Jeff: The model b/c access and trust
+  * Tim: The models and base execution environemtn -> Vendor agnosticism
+  * Paige: Yes and remixes are really imporant for development
+* Anything else
+  * Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world
+  * Paige: Currently many people just use paid apis to abstract the infra, but we need this stuff selfhostable
+  * Tim: I don'T want to know about the hardware, the whole infra side should be done by the cloudnative teams to let ML-Engi to just be ML-Engi
--- a/content/day2/03_accelerating_ai_workloads.md
+++ b/content/day2/03_accelerating_ai_workloads.md
@ -0,0 +1,50 @@
+---
+title: Accelerating AI workloads with GPUs in kubernetes
+weight: 3
+---
+
+Kevin and Sanjay from NVIDIA
+
+## Enabeling GPUs in Kubernetes today
+
+* Host level components: Toolkit, drivers
+* Kubernetes components: Device plugin, feature discovery, node selector
+* NVIDIA humbly brings you a GPU operator
+
+## GPU sharing
+
+* Time slicing: Switch around by time
+* Multi Process Service: Run allways on the GPU but share (space-)
+* Multi Instance GPU: Space-seperated sharing on the hardware
+* Virtual GPU: Virtualices Time slicing or MIG
+* CUDA Streams: Run multiple kernels in a single app
+
+## Dynamic resource allocation
+
+* A new alpha feature since Kube 1.26 for dynamic ressource requesting
+* You just request a ressource via the API and have fun
+* The sharing itself is an implementation detail
+
+## GPU scale out challenges
+
+* NVIDIA Picasso is a foundry for model creation powered by Kubernetes
+* The workload is the training workload split into batches
+* Challenge: Schedule multiple training jobs by different users that are prioritized
+
+### Topology aware placments
+
+* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
+* Target: optimize related jobs based on GPU node distance and NUMA placement
+
+### Fault tolerance and resiliency
+
+* Stuff can break, resulting in slowdowns or errors
+* Challenge: Detect faults and handle them
+* Observability both in-band and out ouf band that expose node conditions in kubernetes
+* Needed: Automated fault-tolerant scheduling
+
+### Multi-dimensional optimization
+
+* There are different KPIs: starvation, prioprity, occupanccy, fainrness
+* Challenge: What to choose (the multi-dimensional decision problemn)
+* Needed: A scheduler that can balance the dimensions
--- a/content/day2/04_sponsored_ai_platform.md
+++ b/content/day2/04_sponsored_ai_platform.md
@ -0,0 +1,22 @@
+---
+title: Sponsored: Build an open source platform for ai/ml
+weight: 4
+---
+
+Jorge Palma from Microsoft with a quick introduction.
+
+## Baseline
+
+* Kubernetes is cool and all
+* Challenges:
+  * Containerized models
+  * GPUs in the cluster (install, management)
+
+## Kubernetes AI Toolchain (KAITO)
+
+* Kubernetes operator that interacts with
+  * Node provisioner
+  * Deployment
+* Simple CRD that decribes a model, infra and have fun
+* Creates inferance endpoint
+* Models are currently 10 (Hugginface, LLMA, etc)
--- a/content/day2/05_performance_sustainability.md
+++ b/content/day2/05_performance_sustainability.md
@ -0,0 +1,16 @@
+---
+title: Optimizing performance and sustainability for ai
+weight: 5
+---
+
+A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN.
+It was pretty scripted with prepared (sponsor specific) slides for each question answered.
+
+## Takeaways
+
+* Deploying a ML should become the new deploy a web app
+* The hardware should be fully utilized -> Better ressource sharing and scheduling
+* Smaller LLMs on cpu only is preyy cost efficient
+* Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow
+* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
+* We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads
--- a/content/day2/06_newsshow_ai_edition.md
+++ b/content/day2/06_newsshow_ai_edition.md
@ -0,0 +1,43 @@
+---
+title: Cloudnative news show (AI edition)
+weight: 6
+---
+
+Nikhita presented projects that merge CloudNative and AI.
+PAtrick Ohly Joined for DRA
+
+### The "news"
+
+* New work group AI
+* More tools are including ai features
+* New updated cncf for children feat AI
+* One decade of Kubernetes
+* DRA is in alpha
+
+### DRA
+
+* A new API for resources (node-local and node-attached)
+* Sharing of ressources between cods and containers
+* Vendor specific stuff are abstracted by a vendor driver controller
+* The kube scheduler can interact with the vendor parameters for scheduling and autoscaling
+
+### Cloudnative AI ecosystem
+
+* Kube is the seed for the AI infra plant
+* Kubeflow users wanted AI registries
+* LLM on the edge 
+* Opentelemetry bring semandtics
+* All of these tools form a symbiosis between 
+* Topics of discussions
+
+### The working group AI
+
+* It was formed in october 2023
+* They are working on the whitepaper (cloudnative and ai) wich was opublished on 19.03.2024
+* The landscape "cloudnative and ai" is WIP and will be merged into the main CNCF landscape
+* The future focus will be on security and cost efficiency (with a hint of sustainability)
+
+### LFAI and CNCF 
+
+* The direcor of the AI foundation talks abouzt ai and cloudnative
+* They are looking forward to more colaboraion
--- a/content/day2/_index.md
+++ b/content/day2/_index.md
@ -0,0 +1,10 @@
+---
+archetype: chapter 
+title: Day 2
+---
+
+Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
+This is where all of the people joined (over 2000)
+
+The opening keynotes were a mix of talks and panel discussions.
+The main topic was - who could have guessed - AI and ML.