cnsmunich25/content/day2/03_k3s-gpu.md

---
title: "Brains on the edge - running ai workloads with k3s and gpu nodes"
weight: 3
tags:
 - ai
 - gpu
---

<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->

I decided not to note down the usual "typical challenges on the edge" slides (about 10 mins of the talk)

## Baseline

- Edge can be split up: Near Edge, Far Edge, Device Edge
- They use k3s for all edge clusters

## Prerequisites

- Software: GPU Driver, Container Toolkit, Device Plugin
- Hardware: NVIDIA GPU with a supported distro
- Runtime: Not all runtimes support GPUs (containerd and CRI-O do)

## Architecture

```mermaid
graph LR
subgraph Edge
    MQTT
    Kafka
    Analytics

    MQTT-->|Publish collected sensor data|Kafka
    Kafka-->|Provide data to run|Analytics
end
subgraph Azure
    Storage
    Monitoring
    MLFlow

    Storage-->|Provide long term analytics|MLFlow
end

Analytics<-->|Sync models|MLFlow
Kafka-->|Save to long term|Storage
Monitoring-.->|Observe|Storage
Monitoring-.->|Observe|MLFlow
```

## Q&A

- Did you use the nvidia gpu operator: Yes
- Which runtime did you use: ContainerD via K3S
- Why k3s over k0s: Because we used it
- Were you power limited: Nope, the edge was on a large ship