day 2 keynotes
This commit is contained in:
parent
a03f0c0da2
commit
3eb8eacd1d
32
content/day2/01_opening.md
Normal file
32
content/day2/01_opening.md
Normal file
@ -0,0 +1,32 @@
|
||||
---
|
||||
title: Opening Keynote
|
||||
weight: 1
|
||||
---
|
||||
|
||||
The opening keynote started - as is the tradition with keynotes - with an "motivational" opening video.
|
||||
The keynote itself was presented by the CEO of the CNCF.
|
||||
|
||||
## The numbers
|
||||
|
||||
* Over 2000 attendees
|
||||
* 10 Years of Kubernetes
|
||||
* 60% of large organizations expect rapid cost increases due to AI/ML (FinOps Survey)
|
||||
|
||||
## The highlights
|
||||
|
||||
* Everyone uses cloudnative
|
||||
* AI uses Kubernetes b/c the UX is way better than classic tools
|
||||
* Especially when transferring from dev to prod
|
||||
* We need standardization
|
||||
* Open source is cool
|
||||
|
||||
## Live demo
|
||||
|
||||
* KIND cluster on desktop
|
||||
* Protptype Stack (develop on client)
|
||||
* Kubernetes with the LLM
|
||||
* Host with LLVA (image describe model), moondream and OLLAMA (the model manager/registry()
|
||||
* Prod Stack (All in kube)
|
||||
* Kubernetes with LLM, LLVA, OLLAMA, moondream
|
||||
* Available Models: llava, mistral bokllava (llava*mistral)
|
||||
* Host takes picture, ai describes what is pictures (in our case the conference audience)
|
36
content/day2/02_ai_keynote.md
Normal file
36
content/day2/02_ai_keynote.md
Normal file
@ -0,0 +1,36 @@
|
||||
---
|
||||
title: AI KEynote discussion
|
||||
weight: 2
|
||||
---
|
||||
|
||||
A podium discussion (somewhat scripted) lead by Pryanka
|
||||
|
||||
## Guests
|
||||
|
||||
* Tim from Mistral
|
||||
* Paige from Google AI
|
||||
* Jeff founder of OLLAMA
|
||||
|
||||
## Discussion
|
||||
|
||||
* What do you use as the base of dev for ollama
|
||||
* Jeff: The concepts from docker, git, kubernetes
|
||||
* How is the balance between ai engi and ai ops
|
||||
* Jeff: The classic dev vs ops devide, many ML-Engi don't think about
|
||||
* Paige: Yessir
|
||||
* How does infra keep up with the fast research
|
||||
* Paige: Well, they don't - but they do their best and Cloudnative is cool
|
||||
* Jeff: Well we're not google, but kubernetes is the saviour
|
||||
* What are scaling constraints
|
||||
* Jeff: Currently sizing of models is still in it's infancy
|
||||
* Jeff: There will be more specific hardware and someone will have to support it
|
||||
* Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization)
|
||||
* Paige: Optimization of smaller models
|
||||
* What technologies need to be open source licensed
|
||||
* Jeff: The model b/c access and trust
|
||||
* Tim: The models and base execution environemtn -> Vendor agnosticism
|
||||
* Paige: Yes and remixes are really imporant for development
|
||||
* Anything else
|
||||
* Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world
|
||||
* Paige: Currently many people just use paid apis to abstract the infra, but we need this stuff selfhostable
|
||||
* Tim: I don'T want to know about the hardware, the whole infra side should be done by the cloudnative teams to let ML-Engi to just be ML-Engi
|
50
content/day2/03_accelerating_ai_workloads.md
Normal file
50
content/day2/03_accelerating_ai_workloads.md
Normal file
@ -0,0 +1,50 @@
|
||||
---
|
||||
title: Accelerating AI workloads with GPUs in kubernetes
|
||||
weight: 3
|
||||
---
|
||||
|
||||
Kevin and Sanjay from NVIDIA
|
||||
|
||||
## Enabeling GPUs in Kubernetes today
|
||||
|
||||
* Host level components: Toolkit, drivers
|
||||
* Kubernetes components: Device plugin, feature discovery, node selector
|
||||
* NVIDIA humbly brings you a GPU operator
|
||||
|
||||
## GPU sharing
|
||||
|
||||
* Time slicing: Switch around by time
|
||||
* Multi Process Service: Run allways on the GPU but share (space-)
|
||||
* Multi Instance GPU: Space-seperated sharing on the hardware
|
||||
* Virtual GPU: Virtualices Time slicing or MIG
|
||||
* CUDA Streams: Run multiple kernels in a single app
|
||||
|
||||
## Dynamic resource allocation
|
||||
|
||||
* A new alpha feature since Kube 1.26 for dynamic ressource requesting
|
||||
* You just request a ressource via the API and have fun
|
||||
* The sharing itself is an implementation detail
|
||||
|
||||
## GPU scale out challenges
|
||||
|
||||
* NVIDIA Picasso is a foundry for model creation powered by Kubernetes
|
||||
* The workload is the training workload split into batches
|
||||
* Challenge: Schedule multiple training jobs by different users that are prioritized
|
||||
|
||||
### Topology aware placments
|
||||
|
||||
* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
|
||||
* Target: optimize related jobs based on GPU node distance and NUMA placement
|
||||
|
||||
### Fault tolerance and resiliency
|
||||
|
||||
* Stuff can break, resulting in slowdowns or errors
|
||||
* Challenge: Detect faults and handle them
|
||||
* Observability both in-band and out ouf band that expose node conditions in kubernetes
|
||||
* Needed: Automated fault-tolerant scheduling
|
||||
|
||||
### Multi-dimensional optimization
|
||||
|
||||
* There are different KPIs: starvation, prioprity, occupanccy, fainrness
|
||||
* Challenge: What to choose (the multi-dimensional decision problemn)
|
||||
* Needed: A scheduler that can balance the dimensions
|
22
content/day2/04_sponsored_ai_platform.md
Normal file
22
content/day2/04_sponsored_ai_platform.md
Normal file
@ -0,0 +1,22 @@
|
||||
---
|
||||
title: Sponsored: Build an open source platform for ai/ml
|
||||
weight: 4
|
||||
---
|
||||
|
||||
Jorge Palma from Microsoft with a quick introduction.
|
||||
|
||||
## Baseline
|
||||
|
||||
* Kubernetes is cool and all
|
||||
* Challenges:
|
||||
* Containerized models
|
||||
* GPUs in the cluster (install, management)
|
||||
|
||||
## Kubernetes AI Toolchain (KAITO)
|
||||
|
||||
* Kubernetes operator that interacts with
|
||||
* Node provisioner
|
||||
* Deployment
|
||||
* Simple CRD that decribes a model, infra and have fun
|
||||
* Creates inferance endpoint
|
||||
* Models are currently 10 (Hugginface, LLMA, etc)
|
16
content/day2/05_performance_sustainability.md
Normal file
16
content/day2/05_performance_sustainability.md
Normal file
@ -0,0 +1,16 @@
|
||||
---
|
||||
title: Optimizing performance and sustainability for ai
|
||||
weight: 5
|
||||
---
|
||||
|
||||
A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN.
|
||||
It was pretty scripted with prepared (sponsor specific) slides for each question answered.
|
||||
|
||||
## Takeaways
|
||||
|
||||
* Deploying a ML should become the new deploy a web app
|
||||
* The hardware should be fully utilized -> Better ressource sharing and scheduling
|
||||
* Smaller LLMs on cpu only is preyy cost efficient
|
||||
* Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow
|
||||
* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
|
||||
* We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads
|
43
content/day2/06_newsshow_ai_edition.md
Normal file
43
content/day2/06_newsshow_ai_edition.md
Normal file
@ -0,0 +1,43 @@
|
||||
---
|
||||
title: Cloudnative news show (AI edition)
|
||||
weight: 6
|
||||
---
|
||||
|
||||
Nikhita presented projects that merge CloudNative and AI.
|
||||
PAtrick Ohly Joined for DRA
|
||||
|
||||
### The "news"
|
||||
|
||||
* New work group AI
|
||||
* More tools are including ai features
|
||||
* New updated cncf for children feat AI
|
||||
* One decade of Kubernetes
|
||||
* DRA is in alpha
|
||||
|
||||
### DRA
|
||||
|
||||
* A new API for resources (node-local and node-attached)
|
||||
* Sharing of ressources between cods and containers
|
||||
* Vendor specific stuff are abstracted by a vendor driver controller
|
||||
* The kube scheduler can interact with the vendor parameters for scheduling and autoscaling
|
||||
|
||||
### Cloudnative AI ecosystem
|
||||
|
||||
* Kube is the seed for the AI infra plant
|
||||
* Kubeflow users wanted AI registries
|
||||
* LLM on the edge
|
||||
* Opentelemetry bring semandtics
|
||||
* All of these tools form a symbiosis between
|
||||
* Topics of discussions
|
||||
|
||||
### The working group AI
|
||||
|
||||
* It was formed in october 2023
|
||||
* They are working on the whitepaper (cloudnative and ai) wich was opublished on 19.03.2024
|
||||
* The landscape "cloudnative and ai" is WIP and will be merged into the main CNCF landscape
|
||||
* The future focus will be on security and cost efficiency (with a hint of sustainability)
|
||||
|
||||
### LFAI and CNCF
|
||||
|
||||
* The direcor of the AI foundation talks abouzt ai and cloudnative
|
||||
* They are looking forward to more colaboraion
|
10
content/day2/_index.md
Normal file
10
content/day2/_index.md
Normal file
@ -0,0 +1,10 @@
|
||||
---
|
||||
archetype: chapter
|
||||
title: Day 2
|
||||
---
|
||||
|
||||
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
|
||||
This is where all of the people joined (over 2000)
|
||||
|
||||
The opening keynotes were a mix of talks and panel discussions.
|
||||
The main topic was - who could have guessed - AI and ML.
|
Loading…
x
Reference in New Issue
Block a user