55 lines
1.7 KiB
Markdown
55 lines
1.7 KiB
Markdown
---
|
|
title: Accelerating AI workloads with GPUs in kubernetes
|
|
weight: 3
|
|
tags:
|
|
- keynote
|
|
- ai
|
|
- nvidia
|
|
---
|
|
|
|
Kevin and Sanjay from NVIDIA
|
|
|
|
## Enabling GPUs in Kubernetes today
|
|
|
|
* Host level components: Toolkit, drivers
|
|
* Kubernetes components: Device plugin, feature discovery, node selector
|
|
* NVIDIA humbly brings you a GPU operator
|
|
|
|
## GPU sharing
|
|
|
|
* Time slicing: Switch around by time
|
|
* Multi Process Service: Always run on the GPU but share (space-)
|
|
* Multi Instance GPU: Space-seperated sharing on the hardware
|
|
* Virtual GPU: Virtualizes Time slicing or MIG
|
|
* CUDA Streams: Run multiple kernels in a single app
|
|
|
|
## Dynamic resource allocation
|
|
|
|
* A new alpha feature since Kube 1.26 for dynamic resource requesting
|
|
* You just request a resource via the API and have fun
|
|
* The sharing itself is an implementation detail
|
|
|
|
## GPU scale-out challenges
|
|
|
|
* NVIDIA Picasso is a foundry for model creation powered by Kubernetes
|
|
* The workload is the training workload split into batches
|
|
* Challenge: Schedule multiple training jobs by different users that are prioritized
|
|
|
|
### Topology aware placements
|
|
|
|
* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
|
|
* Target: optimize related jobs based on GPU node distance and NUMA placement
|
|
|
|
### Fault tolerance and resiliency
|
|
|
|
* Stuff can break, resulting in slowdowns or errors
|
|
* Challenge: Detect faults and handle them
|
|
* Observability both in-band and out of band that expose node conditions in Kubernetes
|
|
* Needed: Automated fault-tolerant scheduling
|
|
|
|
### Multidimensional optimization
|
|
|
|
* There are different KPIs: starvation, priority, occupancy, fairness
|
|
* Challenge: What to choose (the multidimensional decision problem)
|
|
* Needed: A scheduler that can balance the dimensions
|