Skip to main content

Your Cluster is Lying to ArgoCD (And How to Catch It)

H.2213 | Day 2 | 13:35 - 14:00 | Speakers: Graziano Casto

Your Cluster is Lying to ArgoCD (And How to Catch It)
A picture of a devroom at FOSDEM 2024
Open in browser
Get involved in the conversation!Join the chat

Notes

Abstract

We love ArgoCD, but it creates a classic "map vs. territory" problem. We treat Git as our "map", our single source of truth. But the cluster is the "territory", and it's often more complex than the map shows. This becomes a crisis with the 3 AM hotfix: an SRE fixes production, changing the territory. ArgoCD, loyal to the map, sees this as drift and helpfully overwrites the fix, re-breaking the cluster. The problem is that Git isn't our Truth, it's our Intention. This talk introduces a pragmatic solution: Cluster-Scoped Snapshotting. We’ll show a simple pattern that dumps the entire live cluster state (the "territory") into its own "reality" Git repo. To automate this, we wrote a small open-source tool called Kalco, but the pattern is the real takeaway. This "reality" repo gives us a powerful "pre-flight diff" in our CI pipeline, comparing our "intention" (the app repo) against the "truth" (the snapshot repo). This simple check lets us bootstrap existing clusters, create a complete audit log, and stop our pipeline before it merges a change that conflicts with a critical live fix.


Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.