docs(day1): Metal³ Talk

This commit is contained in:
Nicolai Ort 2025-07-21 12:19:35 +02:00
parent f18ef168c9
commit 8a06439797

View File

@ -0,0 +1,61 @@
---
title: Bringing Cloud-Native Agility to Bare-Metal Kubernetes with Cluster API and Metal³
weight: 5
tags:
- capi
- baremetal
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
Premise: We are in the **cloud native** era, not the **cloud* era -> We can do stuff onprem
## Baseline
- Baremetal is cool b/c: No overhead, no hypervisor license, direct scheduling access
- But: Low provisioning, inflexible, needs drivers and so on
- Cluster Provisioning can be done in many different ways, even kube-native with CAPI (yay)
- CAPI is cool and can be used with many different infra providers
> I wont repeat the idea behind CAPI here (which were the next 5-10min of the talk)
## What about baremetal
> Metal³ (MEtal-kubed) to the rescue
```mermaid
graph LR
CAPI-->BareMetalOperator-->Ironic-->BMC("BMC (IPMI, etc)")
```
### Ironic
- API: Expose actions
- Conductor: Talk to MBC
- Agent: Ramdisk for disk cleaning and all other initial local stuff
- Inspector: Collect HWInfo
- DB: Store state
### Baremetal operator
- Runs in management cluster and watched BaremetalHost CR
- Can be used with or without CAPI
- BaremetalHost controls stuff like: Firware, Power State, Raid, Boot mode, ...
- HardwareData CRD contains information about the Host after it has been checked by Ironic Inspector
TODO: Steal state flow chart from slides
### Provisioning
- Baseline: Host is in the Available state
- Action: Assign a OS image with userdata to a host and specify the storage device (defaults to /dev/sda)
- Reaction: Metal³ chooses a fitting baremetalhost to consume
- Result: System boots into os image and initialized via ignition (userdata)
## Observations/Other
- Metal³ is pretty slow (due to hardware and ironic)
- You need spare servers for rolling update -> You can reuse a existing server during upgrade via Recreate Update
- We can controll some harware operations via node annotations like shutdown/reboot/maintainance