kubecon25/content/day1/02_migrations.md

---
title: Day 2000 - Migrating from kubeadm + ansible to clusterapi+talos
weight: 2
tags:
 - kubecon
 - platform
---

<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->

## Background

- They use large, shared clusters
- The oldest cluster is 2099 days (5,8 years) old
- Onprem hosted on vSphere with vanilla kubeadm
- Fun fact: They run chaosmonkey on all clusters -> Automaticly prepares for updates

### Legacy provisioning

1. Terraform create debian vm
2. Deploy base tools with puppet
3. Register nodes in inventory yaml file
4. run ansible playbook -> Renders configs and runs kubeadm
5. Configure ArgoCD

### Target

- Use Clusterapi to manage the workload-clusters
    - Basic CRDS: Cluster, MachineDeployment, Machine
- Talos: Immutable, minimal, ephemeral with declarative config via grpc api

TODO: Steal diagrams from slides


## Migration

1. Config matching between kubeadm and talos+capi
2. Import PKI/Certs
3. Create ClusterAPI CRDs
4. Add ClusterAPI Nodes
5. Remove kubeadm nodes

### 1. Config matching

1. Serviceaccount Issuer: Talos has it's own default
2. etcd encryption key names are hardcoded in talos
3. Re-Encrypt all secrets (get secrets, replace secrets)

### 2. PKI

1. Talos includes some logic that can generate a secrets bundle from an existing API
2. Import: The etcd, k8s, serviceaccount and os (talos specific, used for the talos api auth) certificates

### 3. CRDs

- One namespace per workload cluster
- Cluster-CRD: Ref to CP and Infrastructure
- ControlPlane-CRD: Create cp MDs
- Infrastructure: References template for wokrer-MDs

TODO: Steal image

### 4. Add ClusterAPI Nodes

- Add new CP and Worker Nodes to the cluster that are managed by CAPI (slowly, stuff will break)
- Remove the old nodes one by one over weeks ore months
- Potential Problems:
    - Mismatched serviceaccountissuer
    - Missing etcd encryption key
    - Wrong etcd encryption key
    - Loss of quorum: `--force-new-cluster` can force recovery on one node of the etcd cluster

## Demo

I reccomend watching the demo