--- title: "Think Big: Monitoring Stack was yesterday - Observability Platform at scale!" weight: 10 tags: - monitoring - observability --- ## Where do you start with monitoring - The cloud standard solution: Prometheus - But: What if we don't just monitor one app but a cluster or muiltiple clusters? - Problem: Prometheus isn't quite the best when it comes to scaling - And: We want Dashboards, Traces, Alerting, Logs, Auditing, ... ## Trying to build the master monitoring by just adding stuff on the side - Add custom stuff - More complex setups - Less and less documentation and standardization ## But how do we regain controll - Product Thinking: Let's collect the problems - Result: No clear seperation of the product, no vision (just firefighting), We want better releases and improve resource usage ### Transition 1. Overview of the current stack -> Just list all components -> We're no longer just a monitoring stack, but we do overvability 2. 1. Long term goals and vision -> Add clear interfaces and contracts (hey platform mindset, we've heard that one before) based on expectations 2. Target groups and journeys -> Clear reponsibility cut-off between platfrom<->users 3. Improve the plattform -> Needs full buy in to be the **central**, **open** and **selfservice** platform - In their case: Focus on Mimir instead of prometheus and alloy but keep grafana and loki - Define everything else as out of scope (for now) - Expand scope by improving the experience instead of just "adding tools" ## Pillars of Observability - Data management: Ingest, Query - Dashboard Management: Create, Update, Export - Alert Management: Rules, Routing, Analytics, Silence ## Wrap up - Do i need monitoring or more (both is fine)? - Identify the target audience and their journey (not jsut the tools they want to use) - Improve the experience and say no if a user requests something that would not improve it