1.8 KiB
1.8 KiB
title | weight | tags | |||
---|---|---|---|---|---|
Accelerating AI workloads with GPUs in kubernetes | 3 |
|
Kevin and Sanjay from NVIDIA
Enabeling GPUs in Kubernetes today
- Host level components: Toolkit, drivers
- Kubernetes components: Device plugin, feature discovery, node selector
- NVIDIA humbly brings you a GPU operator
GPU sharing
- Time slicing: Switch around by time
- Multi Process Service: Run allways on the GPU but share (space-)
- Multi Instance GPU: Space-seperated sharing on the hardware
- Virtual GPU: Virtualices Time slicing or MIG
- CUDA Streams: Run multiple kernels in a single app
Dynamic resource allocation
- A new alpha feature since Kube 1.26 for dynamic ressource requesting
- You just request a ressource via the API and have fun
- The sharing itself is an implementation detail
GPU scale out challenges
- NVIDIA Picasso is a foundry for model creation powered by Kubernetes
- The workload is the training workload split into batches
- Challenge: Schedule multiple training jobs by different users that are prioritized
Topology aware placments
- You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
- Target: optimize related jobs based on GPU node distance and NUMA placement
Fault tolerance and resiliency
- Stuff can break, resulting in slowdowns or errors
- Challenge: Detect faults and handle them
- Observability both in-band and out ouf band that expose node conditions in kubernetes
- Needed: Automated fault-tolerant scheduling
Multi-dimensional optimization
- There are different KPIs: starvation, prioprity, occupanccy, fainrness
- Challenge: What to choose (the multi-dimensional decision problemn)
- Needed: A scheduler that can balance the dimensions