kubecon24/content/day2/05_performance_sustainabili...

829 B

title weight tags
Optimizing performance and sustainability for ai 5
keynote
panel

A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN. It was pretty scripted with prepared (sponsor specific) slides for each question answered.

Takeaways

  • Deploying a ML should become the new deploy a web app
  • The hardware should be fully utilized -> Better ressource sharing and scheduling
  • Smaller LLMs on cpu only is preyy cost efficient
  • Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow
  • Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
  • We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads