kubecon24/06_global_operator.md at bafbb46f52736ffaf499c7510a0c5337f981ee47

niggl/kubecon24

Fork 0

Nicolai Ort bafbb46f52

added tags

2024-03-25 13:45:10 +01:00

2.8 KiB

Raw Blame History

title, weight, tags

title

weight

Background

Global means non-china

Edge platform team for cdn, livestreaming, uploads, realtime communication, etc.
Around 250 cluster with 10-600 nodes each - mostly non-cloud aka baremetal
Architecture: Control plane clusters (platform services) - data plane clusters (workload by other teams)
Platform includes logs, metrics, configs, secrets, ...

Challenges

Operators

Operators are essential for platform features
As the feature requests increase, more operators are needed
The deployment of operators throughout many clusters is complex (namespace, deployments, pollicies, ...)

Edge

Limited ressources
Cost implication of platfor features
Real time processing demands by platform features
Balancing act between ressorces used by workload vs platform features (20-25%)

The classic flow

New feature get's requested
Use kube-buiders with the sdk to create the operator
Create namespaces and configs in all clusters
Deploy operator to all clsuters

Possible Solution

Centralized Control Plane

Problem: The controller implementation is limited to a cluster boundry
Idea: Why not create a signle operator that can manage multiple edge clusters
Implementation: Just modify kubebuilder to accept multiple clients (and caches)
Result: It works -> Simpler deployment and troubleshooting
Concerns: High code complexity -> Long familiarization
Balance between "simple central operator" and operator-complexity is hard

Attempt it a bit more like kubebuilder

Each cluster has its own manager
There is a central multimanager that starts all of the cluster specific manager
Controller registration to the manager now handles cluster names
The reconciler knows which cluster it is working on
The multi cluster management basicly just tets all of the cluster secrets and create a manager+controller for each cluster secret
Challenges: Network connectifiy
Solutions:
- Dynamic add/remove of clusters with go channels to prevent pod restarts
- Connectivity health checks -> For loss the recreate manager get's triggered

flowchart TD
    mcm-->m1
    mcm-->m2
    mcm-->m3

flowchart LR
    secrets-->ch(go channels)
    ch-->|CREATE|create(Create manager + Add controller + Start manager)
    ch-->|UPDATE|update(Stop manager + Create manager + Add controller + Start manager)
    ch-->|DELETE|delete(Stop manager)

Conclusion

Acknowlege ressource contrains on edge
Embrace open source adoption instead of build your own
Simplify deployment
Recognize your own optionated approach and it's use cases

2.8 KiB Raw Blame History Unescape Escape

Background

Challenges

Operators

Edge

The classic flow

Possible Solution

Centralized Control Plane

Attempt it a bit more like kubebuilder

Conclusion

2.8 KiB

Raw Blame History