--- title: Building a large scale multi-cloud multi-region SaaS platform with kubernetes controllers weight: 8 tags: - platform - operator - scaling --- {{% button href="https://youtu.be/VhloarnpxVo" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} > Interchangeable wording in this talk: controller == operator A talk by elastic. ## About elastic * Elastic cloud as a managed service * Deployed across AWS/GCP/Azure in over 50 regions * 600000+ Containers ### Elastic and Kube * They offer elastic observability * They offer the ECK operator for simplified deployments ## The baseline * Goal: A large scale (1M+ containers) resilient platform on k8s * Architecture * Global Control: The control plane (API) for users with controllers * Regional Apps: The "shitload" of Kubernetes clusters where the actual customer services live ## Scalability * Challenge: How large can our cluster be, how many clusters do we need * Problem: Only basic guidelines exist for that * Decision: Horizontally scale the number of clusters (5ßß-1K nodes each) * Decision: Disposable clusters * Throw away without data loss * Single source of truth is not cluster etcd but external -> No etcd backups needed * Everything can be recreated any time ## Controllers {{% notice style="note" %}} I won't copy the explanations of operators/controllers in these notes {{% /notice %}} * Many controllers, including (but not limited to) * cluster controller: Register cluster to controller * Project controller: Schedule user's project to cluster * Product controllers (Elasticsearch, Kibana, etc.) * Ingress/Cert manager * Sometimes controllers depend on controllers -> potential complexity * Pro: * Resilient (Self-healing) * Level triggered (desired state vs procedure triggered) * Simple reasoning when comparing desired state vs state machine * Official controller runtime lib * Workqueue: Automatic Dedup, Retry back off and so on ## Global Controllers * Basic operation * Uses project config from Elastic cloud as the desired state * The actual state is a k9s resource in another cluster * Challenge: Where is the source of truth if the data is not stored in etcd * Solution: External data store (Postgres) * Challenge: How do we sync the db sources to Kubernetes * Potential solutions: Replace etcd with the external db * Chosen solution: * The controllers don't use CRDs for storage, but they expose a web-API * Reconciliation still now interacts with the external db and go channels (queue) instead * Then the CRs for the operators get created by the global controller ### Large scale * Problem: Reconcile gets triggered for all objects on restart -> Make sure nothing gets missed and is used with the latest controller version * Idea: Just create more workers for 100K+ Objects * Problem: CPU go brrr and db gets overloaded * Problem: If you create an item during restart, suddenly it is at the end of a 100Kü item work-queue ### Reconcile * User-driven events are processed asap * reconcile of everything should happen, bus with low priority slowly in the background * Solution: Status: LastReconciledRevision (timestamp) gets compare to revision, if larger -> User change * Prioritization: Just a custom event handler with the normal queue and a low priority * Queue: Just a queue that adds items to the normal work-queue with a rate limit ```mermaid flowchart LR low-->rl(ratelimit) rl-->wq(work queue) wq-->controller high-->wq ``` ## Related * Argo for CI/CD * Crossplane for cluster autoprovision