From 021ab45ec591590eb4728c7ecf06a35d73c8558d Mon Sep 17 00:00:00 2001 From: Nicolai Ort Date: Mon, 21 Jul 2025 16:43:35 +0200 Subject: [PATCH] docs(day1): Added observability platform talk notes --- content/day1/10_observability.md | 51 ++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 content/day1/10_observability.md diff --git a/content/day1/10_observability.md b/content/day1/10_observability.md new file mode 100644 index 0000000..77edec8 --- /dev/null +++ b/content/day1/10_observability.md @@ -0,0 +1,51 @@ +--- +title: "Think Big: Monitoring Stack was yesterday - Observability Platform at scale!" +weight: 10 +tags: + - monitoring + - observability +--- + + + + +## Where do you start with monitoring + +- The cloud standard solution: Prometheus +- But: What if we don't just monitor one app but a cluster or muiltiple clusters? +- Problem: Prometheus isn't quite the best when it comes to scaling +- And: We want Dashboards, Traces, Alerting, Logs, Auditing, ... + +## Trying to build the master monitoring by just adding stuff on the side + +- Add custom stuff +- More complex setups +- Less and less documentation and standardization + +## But how do we regain controll + +- Product Thinking: Let's collect the problems +- Result: No clear seperation of the product, no vision (just firefighting), We want better releases and improve resource usage + +### Transition + +1. Overview of the current stack -> Just list all components -> We're no longer just a monitoring stack, but we do overvability +2. + 1. Long term goals and vision -> Add clear interfaces and contracts (hey platform mindset, we've heard that one before) based on expectations + 2. Target groups and journeys -> Clear reponsibility cut-off between platfrom<->users +3. Improve the plattform -> Needs full buy in to be the **central**, **open** and **selfservice** platform + - In their case: Focus on Mimir instead of prometheus and alloy but keep grafana and loki + - Define everything else as out of scope (for now) + - Expand scope by improving the experience instead of just "adding tools" + +## Pillars of Observability + +- Data management: Ingest, Query +- Dashboard Management: Create, Update, Export +- Alert Management: Rules, Routing, Analytics, Silence + +## Wrap up + +- Do i need monitoring or more (both is fine)? +- Identify the target audience and their journey (not jsut the tools they want to use) +- Improve the experience and say no if a user requests something that would not improve it \ No newline at end of file