57 lines
1.4 KiB
Markdown
57 lines
1.4 KiB
Markdown
---
|
|
title: "Brains on the edge - running ai workloads with k3s and gpu nodes"
|
|
weight: 3
|
|
tags:
|
|
- ai
|
|
- gpu
|
|
---
|
|
|
|
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
|
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
|
|
|
I decided not to note down the usual "typical challenges on the edge" slides (about 10 mins of the talk)
|
|
|
|
## Baseline
|
|
|
|
- Edge can be split up: Near Edge, Far Edge, Device Edge
|
|
- They use k3s for all edge clusters
|
|
|
|
## Prerequisites
|
|
|
|
- Software: GPU Driver, Container Toolkit, Device Plugin
|
|
- Hardware: NVIDIA GPU with a supported distro
|
|
- Runtime: Not all runtimes support GPUs (containerd and CRI-O do)
|
|
|
|
## Architecture
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph Edge
|
|
MQTT
|
|
Kafka
|
|
Analytics
|
|
|
|
MQTT-->|Publish collected sensor data|Kafka
|
|
Kafka-->|Provide data to run|Analytics
|
|
end
|
|
subgraph Azure
|
|
Storage
|
|
Monitoring
|
|
MLFlow
|
|
|
|
Storage-->|Provide long term analytics|MLFlow
|
|
end
|
|
|
|
Analytics<-->|Sync models|MLFlow
|
|
Kafka-->|Save to long term|Storage
|
|
Monitoring-.->|Observe|Storage
|
|
Monitoring-.->|Observe|MLFlow
|
|
```
|
|
|
|
## Q&A
|
|
|
|
- Did you use the nvidia gpu operator: Yes
|
|
- Which runtime did you use: ContainerD via K3S
|
|
- Why k3s over k0s: Because we used it
|
|
- Were you power limited: Nope, the edge was on a large ship
|