kubecon24/05_performance_sustainability.md at bafbb46f52736ffaf499c7510a0c5337f981ee47

Nicolai Ort bafbb46f52

added tags

2024-03-25 13:45:10 +01:00

title, weight, tags

title

weight

Takeaways

Deploying a ML should become the new deploy a web app
The hardware should be fully utilized -> Better ressource sharing and scheduling
Smaller LLMs on cpu only is preyy cost efficient
Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow
Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads