final talks of day4

2024-03-22 17:27:33 +01:00 · 2024-03-22 17:27:33 +01:00 · ef360b2e89
commit ef360b2e89
parent c2326acfea
3 changed files with 165 additions and 1 deletions
--- a/content/day4/06_global_operator.md
+++ b/content/day4/06_global_operator.md
@ -0,0 +1,82 @@
+---
+title: "TikTok’s Edge Symphony: Scaling Beyond Boundaries with Multi-Cluster Controllers"
+weight: 6
+---
+
+A talk by TikTok/ByteDace (duh) focussed on using central controllers instead of on the edge.
+
+## Background
+
+> Global means non-china
+
+* Edge platform team for cdn, livestreaming, uploads, realtime communication, etc.
+* Around 250 cluster with 10-600 nodes each - mostly non-cloud aka baremetal
+* Architecture: Control plane clusters (platform services) - data plane clusters (workload by other teams)
+* Platform includes logs, metrics, configs, secrets, ...
+
+## Challenges
+
+### Operators
+
+* Operators are essential for platform features
+* As the feature requests increase, more operators are needed
+* The deployment of operators throughout many clusters is complex (namespace, deployments, pollicies, ...)
+
+### Edge
+
+* Limited ressources
+* Cost implication of platfor features
+* Real time processing demands by platform features
+* Balancing act between ressorces used by workload vs platform features (20-25%)
+
+### The classic flow
+
+1. New feature get's requested
+2. Use kube-buiders with the sdk to create the operator
+3. Create namespaces and configs in all clusters
+4. Deploy operator to all clsuters
+
+## Possible Solution
+
+### Centralized Control Plane
+
+* Problem: The controller implementation is limited to a cluster boundry
+* Idea: Why not create a signle operator that can manage multiple edge clusters
+* Implementation: Just modify kubebuilder to accept multiple clients (and caches)
+* Result: It works -> Simpler deployment and troubleshooting
+* Concerns: High code complexity -> Long familiarization
+* Balance between "simple central operator" and operator-complexity is hard
+
+### Attempt it a bit more like kubebuilder
+
+* Each cluster has its own manager
+* There is a central multimanager that starts all of the cluster specific manager
+* Controller registration to the manager now handles cluster names
+* The reconciler knows which cluster it is working on
+* The multi cluster management basicly just tets all of the cluster secrets and create a manager+controller for each cluster secret
+* Challenges: Network connectifiy
+* Solutions: 
+  * Dynamic add/remove of clusters with go channels to prevent pod restarts
+  * Connectivity health checks -> For loss the recreate manager get's triggered
+
+```mermaid
+flowchart TD
+    mcm-->m1
+    mcm-->m2
+    mcm-->m3
+```
+
+```mermaid
+flowchart LR
+    secrets-->ch(go channels)
+    ch-->|CREATE|create(Create manager + Add controller + Start manager)
+    ch-->|UPDATE|update(Stop manager + Create manager + Add controller + Start manager)
+    ch-->|DELETE|delete(Stop manager)
+```
+
+## Conclusion
+
+* Acknowlege ressource contrains on edge
+* Embrace open source adoption instead of build your own
+* Simplify deployment
+* Recognize your own optionated approach and it's use cases
--- a/content/day4/07_fluentbit.md
+++ b/content/day4/07_fluentbit.md
@ -0,0 +1,78 @@
+---
+title: "Fluent Bit v3: Unified Layer for Logs, Metrics and Traces"
+weight: 7
+---
+
+The last talk of the conference.
+Notes may be a bit unstructured due to tired note taker.
+
+## Background
+
+* FluentD is already graduated
+* FluentBit is a daughter-project of FluentD (also graduated)
+
+## Basics
+
+* Fluentbit is compatible with
+  * prometheus (It can replace the prometheus scraper and node exporter)
+  * openmetrics
+  * opentelemetry (HTTPS input/output)
+* FluentBit can export to Prometheus, Splunk, InfluxDB or others
+* So pretty much it can be used to collect data from a bunch of sources and pipe it out to different backend destinations
+* Fluent ecosystem: No vendor lock-in to observability
+
+### Arhitectures
+
+* The fluent agent collects data and can send it to one or multiple locations
+* FluentBit can be used for aggregation from other sources
+
+### In the kubernetes logging ecosystem
+
+* Pods logs to console -> Streamed stdout/err gets piped to file
+* The logs in the file get encoded as JSON with metadata (date, channel)
+* Labels and annotations only live in the control plane -> You have to collect it additionally -> Expensive
+
+## New stuff
+
+### Limitations with classic architectures
+
+* Problem: Multiple filters slow down the main loop
+
+```mermaid
+flowchart LR
+    subgraph main[Main Thread/Event loop]
+        buffer
+        schedule
+        retry
+        fitler1
+        filter2
+        filter3
+    end
+    in-->|pipe in data|main
+    main-->|filter and pipe out|out
+```
+
+### Solution
+
+* Solution: Processor - a seperate thread segmented by telemetry type
+* Plugins can be written in your favourite language /c, rust, go, ...)
+
+```mermaid
+flowchart LR
+    subgraph in
+        reader
+        streamner1
+        processor2
+        processor3
+    end
+    in-->|pipe in data|main(Main Thread/Event loop)
+    main-->|filter and pipe out|out
+```
+
+### General new features in v3
+
+* Native HTTP/2 support in core
+* Contetn modifier with multiple operations (insert, upsert, delete, rename, hash, extract, convert)
+* Metrics selector (include or exclude metrics) with matcher (name, prefix, substring, regex)
+* SQL processor -> Use SQL expression for selections (instead of filters)
+* Better OpenTelemetry output
--- a/content/lessons_learned/01_operators.md
+++ b/content/lessons_learned/01_operators.md
@ -4,4 +4,8 @@ title: Operators

 ## Observability

-* Export reconcile loop steps as opentelemetry traces
+* Export reconcile loop steps as opentelemetry traces
+
+## Work queue
+
+* Go channels as queues