---
title: "Think Big: Monitoring Stack was yesterday - Observability Platform at scale!"
weight: 10
tags:
 - monitoring
 - observability
---

<!-- {{% button href="https://youtu.be/rkteV6Mzjfs" style="warning" icon="video" %}}Watch talk on YouTube{{% /button %}} -->
<!-- {{% button href="https://docs.google.com/presentation/d/1nEK0CVC_yQgIDqwsdh-PRihB6dc9RyT-" style="tip" icon="person-chalkboard" %}}Slides{{% /button %}} -->

## Where do you start with monitoring

- The cloud standard solution: Prometheus
- But: What if we don't just monitor one app but a cluster or muiltiple clusters?
- Problem: Prometheus isn't quite the best when it comes to scaling
- And: We want Dashboards, Traces, Alerting, Logs, Auditing, ...

## Trying to build the master monitoring by just adding stuff on the side

- Add custom stuff
- More complex setups
- Less and less documentation and standardization

## But how do we regain controll

- Product Thinking: Let's collect the problems 
- Result: No clear seperation of the product, no vision (just firefighting), We want better releases and improve resource usage

### Transition

1. Overview of the current stack -> Just list all components -> We're no longer just a monitoring stack, but we do overvability
2. 
   1. Long term goals and vision -> Add clear interfaces and contracts (hey platform mindset, we've heard that one before) based on expectations
   2. Target groups and journeys -> Clear reponsibility cut-off between platfrom<->users
3. Improve the plattform -> Needs full buy in to be the **central**, **open** and **selfservice** platform
   - In their case: Focus on Mimir instead of prometheus and alloy but keep grafana and loki
   - Define everything else as out of scope (for now)
   - Expand scope by improving the experience instead of just "adding tools"

## Pillars of Observability

- Data management: Ingest, Query
- Dashboard Management: Create, Update, Export
- Alert Management: Rules, Routing, Analytics, Silence

## Wrap up

- Do i need monitoring or more (both is fine)?
- Identify the target audience and their journey (not jsut the tools they want to use)
- Improve the experience and say no if a user requests something that would not improve it