---
title: "The Hitchhiker's Guide to Kubernetes Platforms: Don’t Panic, Just Launch!"
weight: 7
tags:
 - platform
 - scaling
 - operators
 - dx
---

This talks looks at bootstrapping Platforms using KSere.
They do this in regards to AI Workflows.

## Szenario

* Deploy AI Workloads - Sometime consiting of different parts
* Models get stored in a model registry

## Baseline

* Consistent APIs throughout the platform
* Not the kube api directly b/c:
  * Data scientists are a bit overpowered by the kube api
  * Not only Kubernetes (also monitoring tools, feedback tools, etc)
  * Better debugging experience for specific workloads

## The debugging api

* Specific API with enhanced statuses and consistent UX across Code and UI
* Exampüle Endpoints: Pods, Deployments, InferenceServices
* Provides a status summary-> Consistent health info across all related ressources
  * Example: Deployments have progress/availability, Pods have phases, Containers have readyness -> What do we interpret how?
  * Evaluation: Progressing, Available Count vs Readyness, Replicafailure, Pod Phase, Container Readyness
* The rules themselfes may be pretty complex, but - since the user doesn't have to check them themselves - the status is simple

### Debugging Metrics

* Dashboards (Utilization, throughput, latency)
* Events
* Logs

## Deployment API

* Launchpad: Just select your model and version -> The DB (dock) stores all manifests (Spaceship)
* Manifests relate to models from a model registry
* Multi-tenancy is implemented using k8s namespaces
* Kine is used to replace/extend etcd with the relational dock db -> Relation namespace<->manifests is stored here and RBAC can be used
* Launchpad: Select Namespace and check resource (fuel) availability/utilization

### Clsuter maintainance

* Deplyoments can be launched to multiple clusters (even two clusters at once) -> HA through identical clusters
* The excact same manifests get deployed to two clusters
* Cluster desired state is stored externally to enable effortless upogrades, rescale, etc

### Versioning API

* Basicly the dock DB
* CRDs are the representations of the inference manifests
* Rollbacks, Promotion and History is managed via the CRs
* Why not GitOps: Internal Diffs, deployment overrides, customized features

### UX

* User driven API design
* Customized tools
* Everything gets 1:1 replicated for HA
* Large onboarding guide