cnsmunich25/content/day1/10_observability.md

51 lines
2.2 KiB
Markdown

---
title: "Think Big: Monitoring Stack was yesterday - Observability Platform at scale!"
weight: 10
tags:
- monitoring
- observability
---
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
## Where do you start with monitoring
- The cloud standard solution: Prometheus
- But: What if we don't just monitor one app but a cluster or muiltiple clusters?
- Problem: Prometheus isn't quite the best when it comes to scaling
- And: We want Dashboards, Traces, Alerting, Logs, Auditing, ...
## Trying to build the master monitoring by just adding stuff on the side
- Add custom stuff
- More complex setups
- Less and less documentation and standardization
## But how do we regain controll
- Product Thinking: Let's collect the problems
- Result: No clear seperation of the product, no vision (just firefighting), We want better releases and improve resource usage
### Transition
1. Overview of the current stack -> Just list all components -> We're no longer just a monitoring stack, but we do overvability
2.
1. Long term goals and vision -> Add clear interfaces and contracts (hey platform mindset, we've heard that one before) based on expectations
2. Target groups and journeys -> Clear reponsibility cut-off between platfrom<->users
3. Improve the plattform -> Needs full buy in to be the **central**, **open** and **selfservice** platform
- In their case: Focus on Mimir instead of prometheus and alloy but keep grafana and loki
- Define everything else as out of scope (for now)
- Expand scope by improving the experience instead of just "adding tools"
## Pillars of Observability
- Data management: Ingest, Query
- Dashboard Management: Create, Update, Export
- Alert Management: Rules, Routing, Analytics, Silence
## Wrap up
- Do i need monitoring or more (both is fine)?
- Identify the target audience and their journey (not jsut the tools they want to use)
- Improve the experience and say no if a user requests something that would not improve it