--- title: Accelerating AI workloads with GPUs in kubernetes weight: 3 --- Kevin and Sanjay from NVIDIA ## Enabeling GPUs in Kubernetes today * Host level components: Toolkit, drivers * Kubernetes components: Device plugin, feature discovery, node selector * NVIDIA humbly brings you a GPU operator ## GPU sharing * Time slicing: Switch around by time * Multi Process Service: Run allways on the GPU but share (space-) * Multi Instance GPU: Space-seperated sharing on the hardware * Virtual GPU: Virtualices Time slicing or MIG * CUDA Streams: Run multiple kernels in a single app ## Dynamic resource allocation * A new alpha feature since Kube 1.26 for dynamic ressource requesting * You just request a ressource via the API and have fun * The sharing itself is an implementation detail ## GPU scale out challenges * NVIDIA Picasso is a foundry for model creation powered by Kubernetes * The workload is the training workload split into batches * Challenge: Schedule multiple training jobs by different users that are prioritized ### Topology aware placments * You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching * Target: optimize related jobs based on GPU node distance and NUMA placement ### Fault tolerance and resiliency * Stuff can break, resulting in slowdowns or errors * Challenge: Detect faults and handle them * Observability both in-band and out ouf band that expose node conditions in kubernetes * Needed: Automated fault-tolerant scheduling ### Multi-dimensional optimization * There are different KPIs: starvation, prioprity, occupanccy, fainrness * Challenge: What to choose (the multi-dimensional decision problemn) * Needed: A scheduler that can balance the dimensions