Your Switch Config Backups Are Telling On You

Categories:

I went to Networking Field Day 40 expecting Netris to show me something about AI clusters. They did. Alex Saroyan walked us through how they automate networks for neo-cloud operators running thousands of switches and tens of thousands of GPUs. That is not your network. It is not my network. The closest thing most of us run to a “GPU cluster” is the rack where the security camera NVR lives.

But about two-thirds of the way through the session, a delegate asked a question that has nothing to do with AI.

The question was, paraphrased: “With your product in place, do your customers still back up their configs?”

Alex’s answer was a flat no. No per-switch config backups. Switch dies, you slot in a replacement, the agent on the new switch wakes up, calls home to the controller, looks at the model of what that switch is supposed to be, and regenerates the config from scratch. The controller gets backed up. The switches don’t.

What got me was the delegate’s follow-up: he kind of shrugged and said he knew enterprise networks that didn’t back up their configs either, and they moved on like it was a minor side note.

That side note is this post. Because the more I sat with that exchange, the more I realized it exposes something I had never thought to say out loud about how most of us actually run our networks.

What we’re really doing when we back up configs

If you’re running a mid-sized business network, your config backup story probably looks something like one of these:

  • A cron job dumps show run to a file share every night
  • Oxidized or a fork of it, chugging along on a VM nobody has logged into since 2019
  • Kiwi CatTools, because somebody bought it once, and the license still works
  • SolarWinds NCM, if your shop spent the money
  • Or, if we’re being really honest, “the last config I copy-pasted into a Word doc when I made that change.”

This has been “best practice” for as long as I have been in networking. Nobody has ever taken me aside and told me to stop doing it. The auditors love seeing the backups. The disaster recovery plan references them. Everybody nods.

But step back and ask what the act of backing up the config is actually saying. It is saying: the running config on that box is the primary record of how this part of the network is supposed to work. If that box dies, that file is the thing we need to rebuild from. The config isn’t just an artifact; it is the network’s memory.

That’s the part that snuck up on me at NFD40. I had never thought of it that way before.

The Netris contrast (briefly, I promise)

I’m not going to turn this into a vendor tour. I will just say enough about the Netris model to make the contrast clear, because the contrast is the whole point.

In Netris’s world, intent lives in the controller. Topology, IPAM, what services should exist on what switches, what tenant gets what VRF, all of that. Each switch runs an agent. The agent’s job is to look at what the controller says this switch should be doing, and translate that into actual device config on the box. The config is an output of the model, not the input.

So when a switch dies, the procedure is: pull the dead one, rack a new one, let the agent boot up and call home. Controller goes, “oh, you’re switch 17, here’s what you should look like,” and the agent regenerates the config. No backup file restored. No copy-paste from a Word doc. The new switch arrives at the right state because the source of truth knew what the right state was the whole time.

This only works because the network is declarative, not imperative.

Jargon Box: Declarative vs. Imperative Networking

Imperative means you tell the device how to do something, step by step. “Configure interface gig 1/0/1, set it to access mode, put it in VLAN 20, save the config.” Most of us spent our careers typing imperative commands into a CLI.

Declarative means you describe what the network should look like, and something else figures out the commands to get there. “Switch 17 is a leaf in fabric A and serves tenant Bravo.” The system works out the rest.

The quiet admission

Here’s the part that earned this post its title.

If you’re backing up your switch configs, you are admitting that the device is the only place certain pieces of knowledge actually live. Maybe most of it lives in your documentation, but there is something on that box that you cannot reconstruct without it. Otherwise, you wouldn’t need the backup.

Which means your “source of truth,” whatever you call it (NetBox, a SharePoint site, a Visio diagram from before the Slack migration, your inventory in SolarWinds), is not really the source of truth. It’s a description. The running config is the truth, and your documentation is just whatever you remembered to write down about it, with however much drift has crept in since the last time somebody updated it.

That’s the inversion. In a healthy model, configs are a consequence of the source of truth. In most of our networks, the running config is the source of truth, and everything else is a partial, lagging copy.

Jargon Box: Configuration Drift

When the running state of a device gradually diverges from what your documentation says it should be. Caused by emergency changes that nobody documented, “temporary” fixes that became permanent, and that one engineer who left in 2021 and never told anyone about the static route he added at 2 a.m.

Why midmarket landed here

I want to be fair to us. The economics of going fully declarative have never quite worked out for a 30-switch network. Buying a real network automation platform for that scale is overkill. Building your own with Ansible and a NetBox install is a meaningful chunk of somebody’s time, and that somebody is also patching servers, arguing with the firewall vendor, and figuring out why the conference room TV won’t connect to Zoom.

The CLI is sticky because it’s how all of us learned. It’s how we troubleshoot at 2 a.m. when the change window is closing. It is genuinely faster, for one box, to just type the commands than to update a model and let something else generate the config.

I am not saying midmarket networks are doing it wrong. I am saying we should be honest about what our tooling is telling us. If your primary recovery plan is “restore from the config backup,” then the config backup is your source of truth, no matter what your network diagram claims.

What you can actually steal from this

You’re not going to go full Netris. Nobody is selling you the midmarket version of that, at least not at a price your CFO is going to approve this year. But you can move the center of gravity, even on a small network:

  1. Stand up a real source of truth. NetBox is free, well-documented, and good enough. There is no reason in 2026 to be tracking your network in a spreadsheet. If you already have NetBox but only half the network is in it, finish the job.
  2. Document intent, not just config. Why does this VLAN exist? What is this ACL supposed to be blocking? Whose ask was it? That is the part the running config can’t tell you. Five years from now when somebody asks “can we delete this,” the running config will say “here is the rule” but only your intent record can say “yes, that customer left in 2023.”
  3. Treat config changes as derived from a documented decision. The change ticket should reference the intent. Right now in most shops, the change ticket is the intent, and once it’s closed out, the reasoning is buried in a ticketing system nobody searches.
  4. Keep backing up configs. Just reframe what they’re for. They are belt and suspenders. They are not your primary rebuild path. Your primary rebuild path should be: rebuild from documented intent, then sanity check against the backup to catch anything you forgot to document.
  5. Run this test. If your core switch died tonight and you only had your documentation, no backup file, could you stand up a replacement that does the right things? If the answer is no, that is a documentation gap. The backup is hiding it from you. Find the gap and fix it.

None of that requires a platform purchase. It requires admitting that the way most of us run networks today has the relationship between documentation and configuration exactly backwards.

Final Thoughts

Alex’s offhand answer at NFD40 wasn’t really about Netris. It was a paradigm exposure. The question isn’t whether you should back up your configs. The question is, what does it mean that you have to?

I am not arguing that every mid-sized network needs to go declarative. Most of us won’t, and that is fine. But the gap between “the running config is my source of truth” and “my source of truth generates the running config” is the same gap that, on a much bigger scale, separates the way Netris runs a 50,000 GPU cluster from the way most of us have been running networks for 25 years.

The thing worth stealing from the AI networking conversation isn’t the rail-optimized fabrics or the DPUs or the InfiniBand. It’s the discipline. If you walk away from this post and do exactly one thing, let it be this: the next time you make a change to your network, write down why before you write the config. That one habit, repeated for a year, will put you in a better position than any platform purchase will.

The backup is just a sanity check. It should not be the plan.


Disclosure: I attended Networking Field Day 40 as a delegate. Travel, accommodations, and meals during the event were covered by Tech Field Day. I was not asked to write about Netris or any other presenter, and my opinions here are entirely my own.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.