kubecon24/content/day2/05_performance_sustainability.md
2024-03-25 13:53:52 +01:00

20 lines
830 B
Markdown

---
title: Optimizing performance and sustainability for ai
weight: 5
tags:
- keynote
- panel
---
A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN.
It was pretty scripted with prepared (sponsor specific) slides for each question answered.
## Takeaways
* Deploying a ML should become the new deploy a web app
* The hardware should be fully utilized -> Better ressource sharing and scheduling
* Smaller LLMs on cpu only is preyy cost efficient
* Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow
* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
* We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads