--- title: Optimizing performance and sustainability for ai weight: 5 tags: - keynote - panel --- A panel discussion with moderation by Google and participants from Google, Alluxio, Ampere and CERN. It was pretty scripted with prepared (sponsor specific) slides for each question answered. ## Takeaways * Deploying an ML should become the new deployment a web app * The hardware should be fully utilized -> Better resource sharing and scheduling * Smaller LLMs on CPU only is pretty cost-efficient * Better scheduling by splitting into storage + CPU (prepare) and GPU (run) nodes to create a just-in-time flow * Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs * We should be flexible regarding hardware, multi-cluster workloads and hybrid (onprem, burst to cloud) workloads