docs(day1): Added observability platform talk notes
This commit is contained in:
parent
0a464e0dfd
commit
021ab45ec5
51
content/day1/10_observability.md
Normal file
51
content/day1/10_observability.md
Normal file
@ -0,0 +1,51 @@
|
||||
---
|
||||
title: "Think Big: Monitoring Stack was yesterday - Observability Platform at scale!"
|
||||
weight: 10
|
||||
tags:
|
||||
- monitoring
|
||||
- observability
|
||||
---
|
||||
|
||||
<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
|
||||
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->
|
||||
|
||||
## Where do you start with monitoring
|
||||
|
||||
- The cloud standard solution: Prometheus
|
||||
- But: What if we don't just monitor one app but a cluster or muiltiple clusters?
|
||||
- Problem: Prometheus isn't quite the best when it comes to scaling
|
||||
- And: We want Dashboards, Traces, Alerting, Logs, Auditing, ...
|
||||
|
||||
## Trying to build the master monitoring by just adding stuff on the side
|
||||
|
||||
- Add custom stuff
|
||||
- More complex setups
|
||||
- Less and less documentation and standardization
|
||||
|
||||
## But how do we regain controll
|
||||
|
||||
- Product Thinking: Let's collect the problems
|
||||
- Result: No clear seperation of the product, no vision (just firefighting), We want better releases and improve resource usage
|
||||
|
||||
### Transition
|
||||
|
||||
1. Overview of the current stack -> Just list all components -> We're no longer just a monitoring stack, but we do overvability
|
||||
2.
|
||||
1. Long term goals and vision -> Add clear interfaces and contracts (hey platform mindset, we've heard that one before) based on expectations
|
||||
2. Target groups and journeys -> Clear reponsibility cut-off between platfrom<->users
|
||||
3. Improve the plattform -> Needs full buy in to be the **central**, **open** and **selfservice** platform
|
||||
- In their case: Focus on Mimir instead of prometheus and alloy but keep grafana and loki
|
||||
- Define everything else as out of scope (for now)
|
||||
- Expand scope by improving the experience instead of just "adding tools"
|
||||
|
||||
## Pillars of Observability
|
||||
|
||||
- Data management: Ingest, Query
|
||||
- Dashboard Management: Create, Update, Export
|
||||
- Alert Management: Rules, Routing, Analytics, Silence
|
||||
|
||||
## Wrap up
|
||||
|
||||
- Do i need monitoring or more (both is fine)?
|
||||
- Identify the target audience and their journey (not jsut the tools they want to use)
|
||||
- Improve the experience and say no if a user requests something that would not improve it
|
Loading…
x
Reference in New Issue
Block a user