kubecon24/03_accelerating_ai_workloads.md at 76ee14e6676956a429c0534432aeabd5bde36f22 - kubecon24 - ODIT.Services

niggl/kubecon24

Nicolai Ort 76ee14e667

talk links

2024-03-26 15:43:47 +01:00

1.9 KiB

Raw Blame History

title, weight, tags

title

weight

tags

Accelerating AI workloads with GPUs in kubernetes

3

keynote

ai

nvidia

{{% button href="https://youtu.be/gn5SZWyaZ34" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}}

Kevin and Sanjay from NVIDIA

Enabling GPUs in Kubernetes today

Host level components: Toolkit, drivers
Kubernetes components: Device plugin, feature discovery, node selector
NVIDIA humbly brings you a GPU operator

Time slicing: Switch around by time
Multi Process Service: Always run on the GPU but share (space-)
Multi Instance GPU: Space-seperated sharing on the hardware
Virtual GPU: Virtualizes Time slicing or MIG
CUDA Streams: Run multiple kernels in a single app

Dynamic resource allocation

A new alpha feature since Kube 1.26 for dynamic resource requesting
You just request a resource via the API and have fun
The sharing itself is an implementation detail

GPU scale-out challenges

NVIDIA Picasso is a foundry for model creation powered by Kubernetes
The workload is the training workload split into batches
Challenge: Schedule multiple training jobs by different users that are prioritized

Topology aware placements

You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
Target: optimize related jobs based on GPU node distance and NUMA placement

Fault tolerance and resiliency

Stuff can break, resulting in slowdowns or errors
Challenge: Detect faults and handle them
Observability both in-band and out of band that expose node conditions in Kubernetes
Needed: Automated fault-tolerant scheduling

Multidimensional optimization

There are different KPIs: starvation, priority, occupancy, fairness
Challenge: What to choose (the multidimensional decision problem)
Needed: A scheduler that can balance the dimensions