20 lines
834 B
Markdown
20 lines
834 B
Markdown
---
|
|
title: Optimizing performance and sustainability for ai
|
|
weight: 5
|
|
tags:
|
|
- keynote
|
|
- panel
|
|
---
|
|
|
|
A panel discussion with moderation by Google and participants from Google, Alluxio, Ampere and CERN.
|
|
It was pretty scripted with prepared (sponsor specific) slides for each question answered.
|
|
|
|
## Takeaways
|
|
|
|
* Deploying an ML should become the new deployment a web app
|
|
* The hardware should be fully utilized -> Better resource sharing and scheduling
|
|
* Smaller LLMs on CPU only is pretty cost-efficient
|
|
* Better scheduling by splitting into storage + CPU (prepare) and GPU (run) nodes to create a just-in-time flow
|
|
* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
|
|
* We should be flexible regarding hardware, multi-cluster workloads and hybrid (onprem, burst to cloud) workloads
|