--- title: "From Zero to Hero: Scaling Postgres in Kubernetes Using the Power of CloudNativePG" weight: 8 tags: - platform - operators - db --- A short Talk as Part of the Data on Kubernetes day - presented by the VP of Cloud Native at EDB (one of the biggest PG contributors) Stated target: Make the world your single point of failure ## Proposal * Get rid of Vendor-Lockin using the OSS projects PG, K8S and CnPG * PG was the DB of the year 2023 and a bunch of other times in the past * CnPG is a Level 5 mature operator ## 4 Pillars * Seamless Kube API Integration (Operator Pattern) * Advanced observability (Prometheus Exporter, JSON logging) * Declarative Config (Deploy, Scale, Maintain) * Secure by default (Robust containers, mTLS, and so on) ## Clusters * Basic Resource that defines name, instances, sync and storage (and other parameters that have same defaults) * Implementation: Operator creates: * The volumes (PG_Data, WAL (Write ahead log) * Primary and Read-Write Service * Replicas * Read-Only Service (points at replicas) * Failover: * Failure detected * Stop R/W Service * Promote Replica * Activate R/W Service * Kill old primary and demote to replica ## Backup/Recovery * Continuous Backup: Write Ahead Log Backup to object store * Physical: Create from primary or standby to object store or kube volumes * Recovery: Copy full backup and apply WAL until target (last transaction or specific timestamp) is reached * Replica Cluster: Basically recreates a new cluster to a full recovery but keeps the cluster in Read-Only Replica Mode * Planned: Backup Plugin Interface ## Multi-Cluster * Just create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind) * You can also activate replication streaming ## Recommended architecture * Dev Cluster: 1 Instance without PDB and with Continuous backup * Prod: 3 Nodes with automatic failover and continuous backups * Symmetric: Two clusters * Primary: 3-Node Cluster * Secondary: WAL based 3-Node Cluster with a designated primary (to take over if primary cluster fails) * Symmetric Streaming: Same as Secondary, but you manually enable the streaming API for live replication * Cascading Replication: Scale Symmetric to more clusters * Single availability zone: Well, do your best to spread to nodes and aspire to stretched Kubernetes to more AZs ## Roadmap * Replica Cluster (Symmetric) Switchover * Synchronous Symmetric * 3rd Party Plugins * Manage DBs via the Operator * Storage Autoscaling