cnsmunich25/content/day2/03_k3s-gpu.md

57 lines
1.4 KiB
Markdown

---
title: "Brains on the edge - running ai workloads with k3s and gpu nodes"
weight: 3
tags:
- ai
- gpu
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
I decided not to note down the usual "typical challenges on the edge" slides (about 10 mins of the talk)
## Baseline
- Edge can be split up: Near Edge, Far Edge, Device Edge
- They use k3s for all edge clusters
## Prerequisites
- Software: GPU Driver, Container Toolkit, Device Plugin
- Hardware: NVIDIA GPU with a supported distro
- Runtime: Not all runtimes support GPUs (containerd and CRI-O do)
## Architecture
```mermaid
graph LR
subgraph Edge
MQTT
Kafka
Analytics
MQTT-->|Publish collected sensor data|Kafka
Kafka-->|Provide data to run|Analytics
end
subgraph Azure
Storage
Monitoring
MLFlow
Storage-->|Provide long term analytics|MLFlow
end
Analytics<-->|Sync models|MLFlow
Kafka-->|Save to long term|Storage
Monitoring-.->|Observe|Storage
Monitoring-.->|Observe|MLFlow
```
## Q&A
- Did you use the nvidia gpu operator: Yes
- Which runtime did you use: ContainerD via K3S
- Why k3s over k0s: Because we used it
- Were you power limited: Nope, the edge was on a large ship