final talks of day4

This commit is contained in:
Nicolai Ort 2024-03-22 17:27:33 +01:00
parent c2326acfea
commit ef360b2e89
Signed by: niggl
GPG Key ID: 13AFA55AF62F269F
3 changed files with 165 additions and 1 deletions

View File

@ -0,0 +1,82 @@
---
title: "TikToks Edge Symphony: Scaling Beyond Boundaries with Multi-Cluster Controllers"
weight: 6
---
A talk by TikTok/ByteDace (duh) focussed on using central controllers instead of on the edge.
## Background
> Global means non-china
* Edge platform team for cdn, livestreaming, uploads, realtime communication, etc.
* Around 250 cluster with 10-600 nodes each - mostly non-cloud aka baremetal
* Architecture: Control plane clusters (platform services) - data plane clusters (workload by other teams)
* Platform includes logs, metrics, configs, secrets, ...
## Challenges
### Operators
* Operators are essential for platform features
* As the feature requests increase, more operators are needed
* The deployment of operators throughout many clusters is complex (namespace, deployments, pollicies, ...)
### Edge
* Limited ressources
* Cost implication of platfor features
* Real time processing demands by platform features
* Balancing act between ressorces used by workload vs platform features (20-25%)
### The classic flow
1. New feature get's requested
2. Use kube-buiders with the sdk to create the operator
3. Create namespaces and configs in all clusters
4. Deploy operator to all clsuters
## Possible Solution
### Centralized Control Plane
* Problem: The controller implementation is limited to a cluster boundry
* Idea: Why not create a signle operator that can manage multiple edge clusters
* Implementation: Just modify kubebuilder to accept multiple clients (and caches)
* Result: It works -> Simpler deployment and troubleshooting
* Concerns: High code complexity -> Long familiarization
* Balance between "simple central operator" and operator-complexity is hard
### Attempt it a bit more like kubebuilder
* Each cluster has its own manager
* There is a central multimanager that starts all of the cluster specific manager
* Controller registration to the manager now handles cluster names
* The reconciler knows which cluster it is working on
* The multi cluster management basicly just tets all of the cluster secrets and create a manager+controller for each cluster secret
* Challenges: Network connectifiy
* Solutions:
* Dynamic add/remove of clusters with go channels to prevent pod restarts
* Connectivity health checks -> For loss the recreate manager get's triggered
```mermaid
flowchart TD
mcm-->m1
mcm-->m2
mcm-->m3
```
```mermaid
flowchart LR
secrets-->ch(go channels)
ch-->|CREATE|create(Create manager + Add controller + Start manager)
ch-->|UPDATE|update(Stop manager + Create manager + Add controller + Start manager)
ch-->|DELETE|delete(Stop manager)
```
## Conclusion
* Acknowlege ressource contrains on edge
* Embrace open source adoption instead of build your own
* Simplify deployment
* Recognize your own optionated approach and it's use cases

View File

@ -0,0 +1,78 @@
---
title: "Fluent Bit v3: Unified Layer for Logs, Metrics and Traces"
weight: 7
---
The last talk of the conference.
Notes may be a bit unstructured due to tired note taker.
## Background
* FluentD is already graduated
* FluentBit is a daughter-project of FluentD (also graduated)
## Basics
* Fluentbit is compatible with
* prometheus (It can replace the prometheus scraper and node exporter)
* openmetrics
* opentelemetry (HTTPS input/output)
* FluentBit can export to Prometheus, Splunk, InfluxDB or others
* So pretty much it can be used to collect data from a bunch of sources and pipe it out to different backend destinations
* Fluent ecosystem: No vendor lock-in to observability
### Arhitectures
* The fluent agent collects data and can send it to one or multiple locations
* FluentBit can be used for aggregation from other sources
### In the kubernetes logging ecosystem
* Pods logs to console -> Streamed stdout/err gets piped to file
* The logs in the file get encoded as JSON with metadata (date, channel)
* Labels and annotations only live in the control plane -> You have to collect it additionally -> Expensive
## New stuff
### Limitations with classic architectures
* Problem: Multiple filters slow down the main loop
```mermaid
flowchart LR
subgraph main[Main Thread/Event loop]
buffer
schedule
retry
fitler1
filter2
filter3
end
in-->|pipe in data|main
main-->|filter and pipe out|out
```
### Solution
* Solution: Processor - a seperate thread segmented by telemetry type
* Plugins can be written in your favourite language /c, rust, go, ...)
```mermaid
flowchart LR
subgraph in
reader
streamner1
processor2
processor3
end
in-->|pipe in data|main(Main Thread/Event loop)
main-->|filter and pipe out|out
```
### General new features in v3
* Native HTTP/2 support in core
* Contetn modifier with multiple operations (insert, upsert, delete, rename, hash, extract, convert)
* Metrics selector (include or exclude metrics) with matcher (name, prefix, substring, regex)
* SQL processor -> Use SQL expression for selections (instead of filters)
* Better OpenTelemetry output

View File

@ -4,4 +4,8 @@ title: Operators
## Observability
* Export reconcile loop steps as opentelemetry traces
* Export reconcile loop steps as opentelemetry traces
## Work queue
* Go channels as queues