kubecon24/content/day2/03_accelerating_ai_workloads.md at bafbb46f52736ffaf499c7510a0c5337f981ee47 - kubecon24 - ODIT.Services Git

niggl/kubecon24

Files

Nicolai Ort bafbb46f52

added tags

2024-03-25 13:45:10 +01:00

1.8 KiB

Raw Blame History

title, weight, tags

title

weight

tags

Accelerating AI workloads with GPUs in kubernetes

3

keynote

ai

nvidia

Kevin and Sanjay from NVIDIA

Enabeling GPUs in Kubernetes today

Host level components: Toolkit, drivers
Kubernetes components: Device plugin, feature discovery, node selector
NVIDIA humbly brings you a GPU operator

Time slicing: Switch around by time
Multi Process Service: Run allways on the GPU but share (space-)
Multi Instance GPU: Space-seperated sharing on the hardware
Virtual GPU: Virtualices Time slicing or MIG
CUDA Streams: Run multiple kernels in a single app

Dynamic resource allocation

A new alpha feature since Kube 1.26 for dynamic ressource requesting
You just request a ressource via the API and have fun
The sharing itself is an implementation detail

GPU scale out challenges

NVIDIA Picasso is a foundry for model creation powered by Kubernetes
The workload is the training workload split into batches
Challenge: Schedule multiple training jobs by different users that are prioritized

Topology aware placments

You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
Target: optimize related jobs based on GPU node distance and NUMA placement

Fault tolerance and resiliency

Stuff can break, resulting in slowdowns or errors
Challenge: Detect faults and handle them
Observability both in-band and out ouf band that expose node conditions in kubernetes
Needed: Automated fault-tolerant scheduling

Multi-dimensional optimization

There are different KPIs: starvation, prioprity, occupanccy, fainrness
Challenge: What to choose (the multi-dimensional decision problemn)
Needed: A scheduler that can balance the dimensions