kubecon24/05_performance_sustainability.md at 76ee14e6676956a429c0534432aeabd5bde36f22

Nicolai Ort 76ee14e667

talk links

2024-03-26 15:43:47 +01:00

title, weight, tags

title

weight

Takeaways

Deploying an ML should become the new deployment a web app
The hardware should be fully utilized -> Better resource sharing and scheduling
Smaller LLMs on CPU only is pretty cost-efficient
Better scheduling by splitting into storage + CPU (prepare) and GPU (run) nodes to create a just-in-time flow
Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
We should be flexible regarding hardware, multi-cluster workloads and hybrid (onprem, burst to cloud) workloads