kubecon24/content/day2/03_accelerating_ai_workloads.md

---
title: Accelerating AI workloads with GPUs in kubernetes
weight: 3
tags:
  - keynote
  - ai
  - nvidia
---

Kevin and Sanjay from NVIDIA

## Enabeling GPUs in Kubernetes today

* Host level components: Toolkit, drivers
* Kubernetes components: Device plugin, feature discovery, node selector
* NVIDIA humbly brings you a GPU operator

## GPU sharing

* Time slicing: Switch around by time
* Multi Process Service: Run allways on the GPU but share (space-)
* Multi Instance GPU: Space-seperated sharing on the hardware
* Virtual GPU: Virtualices Time slicing or MIG
* CUDA Streams: Run multiple kernels in a single app

## Dynamic resource allocation

* A new alpha feature since Kube 1.26 for dynamic ressource requesting
* You just request a ressource via the API and have fun
* The sharing itself is an implementation detail

## GPU scale out challenges

* NVIDIA Picasso is a foundry for model creation powered by Kubernetes
* The workload is the training workload split into batches
* Challenge: Schedule multiple training jobs by different users that are prioritized

### Topology aware placments

* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
* Target: optimize related jobs based on GPU node distance and NUMA placement

### Fault tolerance and resiliency

* Stuff can break, resulting in slowdowns or errors
* Challenge: Detect faults and handle them
* Observability both in-band and out ouf band that expose node conditions in kubernetes
* Needed: Automated fault-tolerant scheduling

### Multi-dimensional optimization

* There are different KPIs: starvation, prioprity, occupanccy, fainrness
* Challenge: What to choose (the multi-dimensional decision problemn)
* Needed: A scheduler that can balance the dimensions
day 2 keynotes 2024-03-20 09:42:54 +00:00			`---`
			`title: Accelerating AI workloads with GPUs in kubernetes`
			`weight: 3`
added tags 2024-03-25 12:45:10 +00:00			`tags:`
			`- keynote`
			`- ai`
			`- nvidia`
day 2 keynotes 2024-03-20 09:42:54 +00:00			`---`

			`Kevin and Sanjay from NVIDIA`

			`## Enabeling GPUs in Kubernetes today`

			`* Host level components: Toolkit, drivers`
			`* Kubernetes components: Device plugin, feature discovery, node selector`
			`* NVIDIA humbly brings you a GPU operator`

			`## GPU sharing`

			`* Time slicing: Switch around by time`
			`* Multi Process Service: Run allways on the GPU but share (space-)`
			`* Multi Instance GPU: Space-seperated sharing on the hardware`
			`* Virtual GPU: Virtualices Time slicing or MIG`
			`* CUDA Streams: Run multiple kernels in a single app`

			`## Dynamic resource allocation`

			`* A new alpha feature since Kube 1.26 for dynamic ressource requesting`
			`* You just request a ressource via the API and have fun`
			`* The sharing itself is an implementation detail`

			`## GPU scale out challenges`

			`* NVIDIA Picasso is a foundry for model creation powered by Kubernetes`
			`* The workload is the training workload split into batches`
			`* Challenge: Schedule multiple training jobs by different users that are prioritized`

			`### Topology aware placments`

			`* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching`
			`* Target: optimize related jobs based on GPU node distance and NUMA placement`

			`### Fault tolerance and resiliency`

			`* Stuff can break, resulting in slowdowns or errors`
			`* Challenge: Detect faults and handle them`
			`* Observability both in-band and out ouf band that expose node conditions in kubernetes`
			`* Needed: Automated fault-tolerant scheduling`

			`### Multi-dimensional optimization`

			`* There are different KPIs: starvation, prioprity, occupanccy, fainrness`
			`* Challenge: What to choose (the multi-dimensional decision problemn)`
			`* Needed: A scheduler that can balance the dimensions`