kubecon24/content/day2/03_accelerating_ai_workloads.md
2024-03-26 15:43:47 +01:00

57 lines
1.9 KiB
Markdown

---
title: Accelerating AI workloads with GPUs in kubernetes
weight: 3
tags:
- keynote
- ai
- nvidia
---
{{% button href="https://youtu.be/gn5SZWyaZ34" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}
Kevin and Sanjay from NVIDIA
## Enabling GPUs in Kubernetes today
* Host level components: Toolkit, drivers
* Kubernetes components: Device plugin, feature discovery, node selector
* NVIDIA humbly brings you a GPU operator
## GPU sharing
* Time slicing: Switch around by time
* Multi Process Service: Always run on the GPU but share (space-)
* Multi Instance GPU: Space-seperated sharing on the hardware
* Virtual GPU: Virtualizes Time slicing or MIG
* CUDA Streams: Run multiple kernels in a single app
## Dynamic resource allocation
* A new alpha feature since Kube 1.26 for dynamic resource requesting
* You just request a resource via the API and have fun
* The sharing itself is an implementation detail
## GPU scale-out challenges
* NVIDIA Picasso is a foundry for model creation powered by Kubernetes
* The workload is the training workload split into batches
* Challenge: Schedule multiple training jobs by different users that are prioritized
### Topology aware placements
* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
* Target: optimize related jobs based on GPU node distance and NUMA placement
### Fault tolerance and resiliency
* Stuff can break, resulting in slowdowns or errors
* Challenge: Detect faults and handle them
* Observability both in-band and out of band that expose node conditions in Kubernetes
* Needed: Automated fault-tolerant scheduling
### Multidimensional optimization
* There are different KPIs: starvation, priority, occupancy, fairness
* Challenge: What to choose (the multidimensional decision problem)
* Needed: A scheduler that can balance the dimensions