day 2 keynotes
This commit is contained in:
parent
a03f0c0da2
commit
3eb8eacd1d
32
content/day2/01_opening.md
Normal file
32
content/day2/01_opening.md
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
---
|
||||||
|
title: Opening Keynote
|
||||||
|
weight: 1
|
||||||
|
---
|
||||||
|
|
||||||
|
The opening keynote started - as is the tradition with keynotes - with an "motivational" opening video.
|
||||||
|
The keynote itself was presented by the CEO of the CNCF.
|
||||||
|
|
||||||
|
## The numbers
|
||||||
|
|
||||||
|
* Over 2000 attendees
|
||||||
|
* 10 Years of Kubernetes
|
||||||
|
* 60% of large organizations expect rapid cost increases due to AI/ML (FinOps Survey)
|
||||||
|
|
||||||
|
## The highlights
|
||||||
|
|
||||||
|
* Everyone uses cloudnative
|
||||||
|
* AI uses Kubernetes b/c the UX is way better than classic tools
|
||||||
|
* Especially when transferring from dev to prod
|
||||||
|
* We need standardization
|
||||||
|
* Open source is cool
|
||||||
|
|
||||||
|
## Live demo
|
||||||
|
|
||||||
|
* KIND cluster on desktop
|
||||||
|
* Protptype Stack (develop on client)
|
||||||
|
* Kubernetes with the LLM
|
||||||
|
* Host with LLVA (image describe model), moondream and OLLAMA (the model manager/registry()
|
||||||
|
* Prod Stack (All in kube)
|
||||||
|
* Kubernetes with LLM, LLVA, OLLAMA, moondream
|
||||||
|
* Available Models: llava, mistral bokllava (llava*mistral)
|
||||||
|
* Host takes picture, ai describes what is pictures (in our case the conference audience)
|
36
content/day2/02_ai_keynote.md
Normal file
36
content/day2/02_ai_keynote.md
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
---
|
||||||
|
title: AI KEynote discussion
|
||||||
|
weight: 2
|
||||||
|
---
|
||||||
|
|
||||||
|
A podium discussion (somewhat scripted) lead by Pryanka
|
||||||
|
|
||||||
|
## Guests
|
||||||
|
|
||||||
|
* Tim from Mistral
|
||||||
|
* Paige from Google AI
|
||||||
|
* Jeff founder of OLLAMA
|
||||||
|
|
||||||
|
## Discussion
|
||||||
|
|
||||||
|
* What do you use as the base of dev for ollama
|
||||||
|
* Jeff: The concepts from docker, git, kubernetes
|
||||||
|
* How is the balance between ai engi and ai ops
|
||||||
|
* Jeff: The classic dev vs ops devide, many ML-Engi don't think about
|
||||||
|
* Paige: Yessir
|
||||||
|
* How does infra keep up with the fast research
|
||||||
|
* Paige: Well, they don't - but they do their best and Cloudnative is cool
|
||||||
|
* Jeff: Well we're not google, but kubernetes is the saviour
|
||||||
|
* What are scaling constraints
|
||||||
|
* Jeff: Currently sizing of models is still in it's infancy
|
||||||
|
* Jeff: There will be more specific hardware and someone will have to support it
|
||||||
|
* Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization)
|
||||||
|
* Paige: Optimization of smaller models
|
||||||
|
* What technologies need to be open source licensed
|
||||||
|
* Jeff: The model b/c access and trust
|
||||||
|
* Tim: The models and base execution environemtn -> Vendor agnosticism
|
||||||
|
* Paige: Yes and remixes are really imporant for development
|
||||||
|
* Anything else
|
||||||
|
* Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world
|
||||||
|
* Paige: Currently many people just use paid apis to abstract the infra, but we need this stuff selfhostable
|
||||||
|
* Tim: I don'T want to know about the hardware, the whole infra side should be done by the cloudnative teams to let ML-Engi to just be ML-Engi
|
50
content/day2/03_accelerating_ai_workloads.md
Normal file
50
content/day2/03_accelerating_ai_workloads.md
Normal file
@ -0,0 +1,50 @@
|
|||||||
|
---
|
||||||
|
title: Accelerating AI workloads with GPUs in kubernetes
|
||||||
|
weight: 3
|
||||||
|
---
|
||||||
|
|
||||||
|
Kevin and Sanjay from NVIDIA
|
||||||
|
|
||||||
|
## Enabeling GPUs in Kubernetes today
|
||||||
|
|
||||||
|
* Host level components: Toolkit, drivers
|
||||||
|
* Kubernetes components: Device plugin, feature discovery, node selector
|
||||||
|
* NVIDIA humbly brings you a GPU operator
|
||||||
|
|
||||||
|
## GPU sharing
|
||||||
|
|
||||||
|
* Time slicing: Switch around by time
|
||||||
|
* Multi Process Service: Run allways on the GPU but share (space-)
|
||||||
|
* Multi Instance GPU: Space-seperated sharing on the hardware
|
||||||
|
* Virtual GPU: Virtualices Time slicing or MIG
|
||||||
|
* CUDA Streams: Run multiple kernels in a single app
|
||||||
|
|
||||||
|
## Dynamic resource allocation
|
||||||
|
|
||||||
|
* A new alpha feature since Kube 1.26 for dynamic ressource requesting
|
||||||
|
* You just request a ressource via the API and have fun
|
||||||
|
* The sharing itself is an implementation detail
|
||||||
|
|
||||||
|
## GPU scale out challenges
|
||||||
|
|
||||||
|
* NVIDIA Picasso is a foundry for model creation powered by Kubernetes
|
||||||
|
* The workload is the training workload split into batches
|
||||||
|
* Challenge: Schedule multiple training jobs by different users that are prioritized
|
||||||
|
|
||||||
|
### Topology aware placments
|
||||||
|
|
||||||
|
* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
|
||||||
|
* Target: optimize related jobs based on GPU node distance and NUMA placement
|
||||||
|
|
||||||
|
### Fault tolerance and resiliency
|
||||||
|
|
||||||
|
* Stuff can break, resulting in slowdowns or errors
|
||||||
|
* Challenge: Detect faults and handle them
|
||||||
|
* Observability both in-band and out ouf band that expose node conditions in kubernetes
|
||||||
|
* Needed: Automated fault-tolerant scheduling
|
||||||
|
|
||||||
|
### Multi-dimensional optimization
|
||||||
|
|
||||||
|
* There are different KPIs: starvation, prioprity, occupanccy, fainrness
|
||||||
|
* Challenge: What to choose (the multi-dimensional decision problemn)
|
||||||
|
* Needed: A scheduler that can balance the dimensions
|
22
content/day2/04_sponsored_ai_platform.md
Normal file
22
content/day2/04_sponsored_ai_platform.md
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
---
|
||||||
|
title: Sponsored: Build an open source platform for ai/ml
|
||||||
|
weight: 4
|
||||||
|
---
|
||||||
|
|
||||||
|
Jorge Palma from Microsoft with a quick introduction.
|
||||||
|
|
||||||
|
## Baseline
|
||||||
|
|
||||||
|
* Kubernetes is cool and all
|
||||||
|
* Challenges:
|
||||||
|
* Containerized models
|
||||||
|
* GPUs in the cluster (install, management)
|
||||||
|
|
||||||
|
## Kubernetes AI Toolchain (KAITO)
|
||||||
|
|
||||||
|
* Kubernetes operator that interacts with
|
||||||
|
* Node provisioner
|
||||||
|
* Deployment
|
||||||
|
* Simple CRD that decribes a model, infra and have fun
|
||||||
|
* Creates inferance endpoint
|
||||||
|
* Models are currently 10 (Hugginface, LLMA, etc)
|
16
content/day2/05_performance_sustainability.md
Normal file
16
content/day2/05_performance_sustainability.md
Normal file
@ -0,0 +1,16 @@
|
|||||||
|
---
|
||||||
|
title: Optimizing performance and sustainability for ai
|
||||||
|
weight: 5
|
||||||
|
---
|
||||||
|
|
||||||
|
A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN.
|
||||||
|
It was pretty scripted with prepared (sponsor specific) slides for each question answered.
|
||||||
|
|
||||||
|
## Takeaways
|
||||||
|
|
||||||
|
* Deploying a ML should become the new deploy a web app
|
||||||
|
* The hardware should be fully utilized -> Better ressource sharing and scheduling
|
||||||
|
* Smaller LLMs on cpu only is preyy cost efficient
|
||||||
|
* Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow
|
||||||
|
* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
|
||||||
|
* We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads
|
43
content/day2/06_newsshow_ai_edition.md
Normal file
43
content/day2/06_newsshow_ai_edition.md
Normal file
@ -0,0 +1,43 @@
|
|||||||
|
---
|
||||||
|
title: Cloudnative news show (AI edition)
|
||||||
|
weight: 6
|
||||||
|
---
|
||||||
|
|
||||||
|
Nikhita presented projects that merge CloudNative and AI.
|
||||||
|
PAtrick Ohly Joined for DRA
|
||||||
|
|
||||||
|
### The "news"
|
||||||
|
|
||||||
|
* New work group AI
|
||||||
|
* More tools are including ai features
|
||||||
|
* New updated cncf for children feat AI
|
||||||
|
* One decade of Kubernetes
|
||||||
|
* DRA is in alpha
|
||||||
|
|
||||||
|
### DRA
|
||||||
|
|
||||||
|
* A new API for resources (node-local and node-attached)
|
||||||
|
* Sharing of ressources between cods and containers
|
||||||
|
* Vendor specific stuff are abstracted by a vendor driver controller
|
||||||
|
* The kube scheduler can interact with the vendor parameters for scheduling and autoscaling
|
||||||
|
|
||||||
|
### Cloudnative AI ecosystem
|
||||||
|
|
||||||
|
* Kube is the seed for the AI infra plant
|
||||||
|
* Kubeflow users wanted AI registries
|
||||||
|
* LLM on the edge
|
||||||
|
* Opentelemetry bring semandtics
|
||||||
|
* All of these tools form a symbiosis between
|
||||||
|
* Topics of discussions
|
||||||
|
|
||||||
|
### The working group AI
|
||||||
|
|
||||||
|
* It was formed in october 2023
|
||||||
|
* They are working on the whitepaper (cloudnative and ai) wich was opublished on 19.03.2024
|
||||||
|
* The landscape "cloudnative and ai" is WIP and will be merged into the main CNCF landscape
|
||||||
|
* The future focus will be on security and cost efficiency (with a hint of sustainability)
|
||||||
|
|
||||||
|
### LFAI and CNCF
|
||||||
|
|
||||||
|
* The direcor of the AI foundation talks abouzt ai and cloudnative
|
||||||
|
* They are looking forward to more colaboraion
|
10
content/day2/_index.md
Normal file
10
content/day2/_index.md
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
---
|
||||||
|
archetype: chapter
|
||||||
|
title: Day 2
|
||||||
|
---
|
||||||
|
|
||||||
|
Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
|
||||||
|
This is where all of the people joined (over 2000)
|
||||||
|
|
||||||
|
The opening keynotes were a mix of talks and panel discussions.
|
||||||
|
The main topic was - who could have guessed - AI and ML.
|
Loading…
x
Reference in New Issue
Block a user