Some checks failed
Build latest image / build-container (push) Failing after 50s
2.0 KiB
2.0 KiB
title, weight, tags
title | weight | tags | ||
---|---|---|---|---|
Reliable k8s resource Submission & Bookkeeping | 5 |
|
{{% button href="https://youtu.be/NCkHrvqFMl8" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} {{% button href="https://static.sched.com/hosted_files/kccnceu2025/0d/Reliable%20K8S%20Resource%20Submission%20and%20Bookkeeping.pdf" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}}
Service offerings
- Product: HA Container Platform for general utility with a focus on run-to-complete
- Use-Cases: ML Orchestration, CI/CD, Machine maintainace, Financial analysis, Data Processing pipeline
- Requirements: Observability, Scheduling Events, Approval process, Bookkeeping, Datacenter Reseliency
- Focus: Resiliency (HA with datacenter failover)
- What the user needs: Workflow (e.g. generate report, persist report, notify)
- What we need for the user: ConfigMaps + Secrets, Workflow templates for the steps
Challenges
- Read after modify across multiople datacenters
- Many reads against kubeapi that could overload the apiserver
- No native approval flows and limited audit
Submission flows from a users perspective
Submission of runnables
- User: Submits runnable to subnitter with audit
- Submitter: Handels retry, verification, ...
- Submitter: Configures workload on workload clusters
Submission of deployables
- User: deploys mutation to audit/sourceoftrough
- Syncer: Syncs deployables to workload clusters
Reporting
- User wants: UI with latest status for all jobs
- Compliance wants: Transactions on given resource for auditing
- Implementation: Highly available inventory as single source of truth
graph
WorkflowAPI-->|reads|inventory
Consumer-->|updates|inventory
Producer-->|publishes events to|Consumer
Potential Problems
- Problem: Delete event does not get propagated from syncer to producer leading to zombie ressources
- Fix: Periodic Cleanup