kubecon24/content/day1/08_scaling_pg.md

---
title: "From Zero to Hero: Scaling Postgres in Kubernetes Using the Power of CloudNativePG"
weight: 8
tags:
 - platform
 - operators
 - db
---

A short Talk as Part of the Data on Kubernetes day - presented by the VP of Cloud Native at EDB (one of the biggest PG contributors)
Stated target: Make the world your single point of failure

## Proposal

* Get rid of Vendor-Lockin using the OSS projects PG, K8S and CnPG
* PG was the DB of the year 2023 and a bunch of other times in the past
* CnPG is a Level 5 mature operator

## 4 Pillars

* Seamless Kube API Integration (Operator Pattern)
* Advanced observability (Prometheus Exporter, JSON logging)
* Declarative Config (Deploy, Scale, Maintain)
* Secure by default (Robust containers, mTLS, and so on)

## Clusters

* Basic Resource that defines name, instances, sync and storage (and other parameters that have same defaults)
* Implementation: Operator creates:
  * The volumes (PG_Data, WAL (Write ahead log)
  * Primary and Read-Write Service
  * Replicas
  * Read-Only Service (points at replicas)
* Failover:
  * Failure detected
  * Stop R/W Service
  * Promote Replica
  * Activate R/W Service
  * Kill old primary and demote to replica

## Backup/Recovery

* Continuous Backup: Write Ahead Log Backup to object store
* Physical: Create from primary or standby to object store or kube volumes
* Recovery: Copy full backup and apply WAL until target (last transaction or specific timestamp) is reached
* Replica Cluster: Basically recreates a new cluster to a full recovery but keeps the cluster in Read-Only Replica Mode
* Planned: Backup Plugin Interface

## Multi-Cluster

* Just create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind)
* You can also activate replication streaming

## Recommended architecture

* Dev Cluster: 1 Instance without PDB and with Continuous backup
* Prod: 3 Nodes with automatic failover and continuous backups
* Symmetric: Two clusters
  * Primary: 3-Node Cluster
  * Secondary: WAL based 3-Node Cluster with a designated primary (to take over if primary cluster fails)
* Symmetric Streaming: Same as Secondary, but you manually enable the streaming API for live replication
* Cascading Replication: Scale Symmetric to more clusters
* Single availability zone: Well, do your best to spread to nodes and aspire to stretched Kubernetes to more AZs

## Roadmap

* Replica Cluster (Symmetric) Switchover
* Synchronous Symmetric
* 3rd Party Plugins
* Manage DBs via the Operator
* Storage Autoscaling
First batch of day one 2024-03-19 13:59:45 +00:00			`---`
updated titles 2024-03-19 15:53:59 +00:00			`title: "From Zero to Hero: Scaling Postgres in Kubernetes Using the Power of CloudNativePG"`
			`weight: 8`
added tags 2024-03-25 12:45:10 +00:00			`tags:`
			`- platform`
			`- operators`
			`- db`
First batch of day one 2024-03-19 13:59:45 +00:00			`---`

Day 1 typos 2024-03-26 13:39:44 +00:00			`A short Talk as Part of the Data on Kubernetes day - presented by the VP of Cloud Native at EDB (one of the biggest PG contributors)`
First batch of day one 2024-03-19 13:59:45 +00:00			`Stated target: Make the world your single point of failure`

			`## Proposal`

Day 1 typos 2024-03-26 13:39:44 +00:00			`* Get rid of Vendor-Lockin using the OSS projects PG, K8S and CnPG`
First batch of day one 2024-03-19 13:59:45 +00:00			`* PG was the DB of the year 2023 and a bunch of other times in the past`
			`* CnPG is a Level 5 mature operator`

			`## 4 Pillars`

Day 1 typos 2024-03-26 13:39:44 +00:00			`* Seamless Kube API Integration (Operator Pattern)`
First batch of day one 2024-03-19 13:59:45 +00:00			`* Advanced observability (Prometheus Exporter, JSON logging)`
			`* Declarative Config (Deploy, Scale, Maintain)`
Day 1 typos 2024-03-26 13:39:44 +00:00			`* Secure by default (Robust containers, mTLS, and so on)`
First batch of day one 2024-03-19 13:59:45 +00:00
			`## Clusters`

Day 1 typos 2024-03-26 13:39:44 +00:00			`* Basic Resource that defines name, instances, sync and storage (and other parameters that have same defaults)`
First batch of day one 2024-03-19 13:59:45 +00:00			`* Implementation: Operator creates:`
			`* The volumes (PG_Data, WAL (Write ahead log)`
			`* Primary and Read-Write Service`
			`* Replicas`
			`* Read-Only Service (points at replicas)`
			`* Failover:`
			`* Failure detected`
			`* Stop R/W Service`
			`* Promote Replica`
Day 1 typos 2024-03-26 13:39:44 +00:00			`* Activate R/W Service`
			`* Kill old primary and demote to replica`
First batch of day one 2024-03-19 13:59:45 +00:00
			`## Backup/Recovery`

Day 1 typos 2024-03-26 13:39:44 +00:00			`* Continuous Backup: Write Ahead Log Backup to object store`
First batch of day one 2024-03-19 13:59:45 +00:00			`* Physical: Create from primary or standby to object store or kube volumes`
Day 1 typos 2024-03-26 13:39:44 +00:00			`* Recovery: Copy full backup and apply WAL until target (last transaction or specific timestamp) is reached`
			`* Replica Cluster: Basically recreates a new cluster to a full recovery but keeps the cluster in Read-Only Replica Mode`
First batch of day one 2024-03-19 13:59:45 +00:00			`* Planned: Backup Plugin Interface`

			`## Multi-Cluster`

			`* Just create a replica cluster via WAL-files from S3 on another kube cluster (lags 5 mins behind)`
			`* You can also activate replication streaming`

Day 1 typos 2024-03-26 13:39:44 +00:00			`## Recommended architecture`
First batch of day one 2024-03-19 13:59:45 +00:00
Day 1 typos 2024-03-26 13:39:44 +00:00			`* Dev Cluster: 1 Instance without PDB and with Continuous backup`
			`* Prod: 3 Nodes with automatic failover and continuous backups`
First batch of day one 2024-03-19 13:59:45 +00:00			`* Symmetric: Two clusters`
			`* Primary: 3-Node Cluster`
Day 1 typos 2024-03-26 13:39:44 +00:00			`* Secondary: WAL based 3-Node Cluster with a designated primary (to take over if primary cluster fails)`
			`* Symmetric Streaming: Same as Secondary, but you manually enable the streaming API for live replication`
First batch of day one 2024-03-19 13:59:45 +00:00			`* Cascading Replication: Scale Symmetric to more clusters`
Day 1 typos 2024-03-26 13:39:44 +00:00			`* Single availability zone: Well, do your best to spread to nodes and aspire to stretched Kubernetes to more AZs`
First batch of day one 2024-03-19 13:59:45 +00:00
			`## Roadmap`

			`* Replica Cluster (Symmetric) Switchover`
			`* Synchronous Symmetric`
Day 1 typos 2024-03-26 13:39:44 +00:00			`* 3rd Party Plugins`
First batch of day one 2024-03-19 13:59:45 +00:00			`* Manage DBs via the Operator`
			`* Storage Autoscaling`