cnsmunich25/content/day1/10_observability.md

2.2 KiB

title, weight, tags
title weight tags
Think Big: Monitoring Stack was yesterday - Observability Platform at scale! 10
monitoring
observability

Where do you start with monitoring

  • The cloud standard solution: Prometheus
  • But: What if we don't just monitor one app but a cluster or muiltiple clusters?
  • Problem: Prometheus isn't quite the best when it comes to scaling
  • And: We want Dashboards, Traces, Alerting, Logs, Auditing, ...

Trying to build the master monitoring by just adding stuff on the side

  • Add custom stuff
  • More complex setups
  • Less and less documentation and standardization

But how do we regain controll

  • Product Thinking: Let's collect the problems
  • Result: No clear seperation of the product, no vision (just firefighting), We want better releases and improve resource usage

Transition

  1. Overview of the current stack -> Just list all components -> We're no longer just a monitoring stack, but we do overvability
    1. Long term goals and vision -> Add clear interfaces and contracts (hey platform mindset, we've heard that one before) based on expectations
    2. Target groups and journeys -> Clear reponsibility cut-off between platfrom<->users
  2. Improve the plattform -> Needs full buy in to be the central, open and selfservice platform
    • In their case: Focus on Mimir instead of prometheus and alloy but keep grafana and loki
    • Define everything else as out of scope (for now)
    • Expand scope by improving the experience instead of just "adding tools"

Pillars of Observability

  • Data management: Ingest, Query
  • Dashboard Management: Create, Update, Export
  • Alert Management: Rules, Routing, Analytics, Silence

Wrap up

  • Do i need monitoring or more (both is fine)?
  • Identify the target audience and their journey (not jsut the tools they want to use)
  • Improve the experience and say no if a user requests something that would not improve it