kubecon24/content/day2/03_accelerating_ai_workload...

1.8 KiB

title weight tags
Accelerating AI workloads with GPUs in kubernetes 3
keynote
ai
nvidia

Kevin and Sanjay from NVIDIA

Enabeling GPUs in Kubernetes today

  • Host level components: Toolkit, drivers
  • Kubernetes components: Device plugin, feature discovery, node selector
  • NVIDIA humbly brings you a GPU operator

GPU sharing

  • Time slicing: Switch around by time
  • Multi Process Service: Run allways on the GPU but share (space-)
  • Multi Instance GPU: Space-seperated sharing on the hardware
  • Virtual GPU: Virtualices Time slicing or MIG
  • CUDA Streams: Run multiple kernels in a single app

Dynamic resource allocation

  • A new alpha feature since Kube 1.26 for dynamic ressource requesting
  • You just request a ressource via the API and have fun
  • The sharing itself is an implementation detail

GPU scale out challenges

  • NVIDIA Picasso is a foundry for model creation powered by Kubernetes
  • The workload is the training workload split into batches
  • Challenge: Schedule multiple training jobs by different users that are prioritized

Topology aware placments

  • You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
  • Target: optimize related jobs based on GPU node distance and NUMA placement

Fault tolerance and resiliency

  • Stuff can break, resulting in slowdowns or errors
  • Challenge: Detect faults and handle them
  • Observability both in-band and out ouf band that expose node conditions in kubernetes
  • Needed: Automated fault-tolerant scheduling

Multi-dimensional optimization

  • There are different KPIs: starvation, prioprity, occupanccy, fainrness
  • Challenge: What to choose (the multi-dimensional decision problemn)
  • Needed: A scheduler that can balance the dimensions