Your AI Network Isn’t an AI Network (And That’s Fine)

Categories:

You’ve been in this meeting. The vendor SE is two slides into a deck about “AI-ready infrastructure.” There’s a four-box diagram on the screen with terms like “scale-across,” “XPU collectives,” and “back of the back end.” You’re nodding along, half-listening, trying to figure out which of these slides actually applies to the data center you support and which ones are auditioning for a Hyperscaler IPO Roadshow.

I sat through Arista’s presentation at Network Field Day and thought about this dynamic the entire time. Not because Arista did anything wrong. They didn’t. The presentation was sharp, the speakers were credible, and Brendan Gibb opened with one of the cleaner framings I’ve heard for AI networking: there are four AI fabrics, and they each solve a different problem.

It’s a useful framework. It’s also a useful framework for figuring out which AI networking conversations you can safely tune out. Most of them.

The Four Fabrics, Translated

Brendan walked through them in order, so I’ll do the same. In plain English, with the marketing varnish stripped off:

Front-end. This is the network that connects your AI systems to everything else. Users hitting an inference endpoint. Storage feeding model weights. The WAN. It looks and acts like a regular data center network because that’s basically what it is.

Scale-out. This is the back-end fabric tying GPUs together for training. Massive bandwidth, lossless transport, every GPU talking to every other GPU at 400 or 800 gigabits per second. This is the “AI fabric” everyone pictures when someone says “AI fabric.”

Scale-across. When your training cluster won’t physically fit in one building because you ran out of power, space, or both, you stretch it across multiple data centers. Custom long-distance optics, deep buffers, and mandatory encryption between sites.

Scale-up. The newest one, and Arista calls it “the back of the back end.” It’s the fabric that knits GPUs together inside a single chassis or rack so they share memory like one giant accelerator. Today, this is what NVIDIA’s NVLink and NVSwitch do. Arista (and a lot of other vendors) want Ethernet to do it instead.

That’s the framework. Four fabrics. Four problems. Four sets of products.

Lossless transport — networks designed to never drop a packet, because dropped packets in AI training mean restarting from a checkpoint and burning hours of expensive GPU time.

XPU — generic term for any AI accelerator. GPUs, TPUs, custom silicon, whatever the vendor wants to sell you next.

Leaf-spine — flat, two-tier network design where every leaf switch connects to every spine switch. The thing most data center networks have been since approximately forever.

Three of These Probably Don’t Apply To You

I want to be careful here. Arista’s pitch is genuinely good. They have legitimate technology across all four fabrics, they’ve earned a leadership position in this space, and the engineering work they’re doing is real. None of what follows is a knock on them.

But

If you run IT for a midsized business, three of the four boxes on that slide don’t describe a network you’re going to build. Probably ever.

You aren’t building scale-out. You aren’t training foundation models. If you’re doing AI in your business, you’re almost certainly buying inference as a service, calling someone’s API, or running a small fine-tuned model on a couple of GPUs in a rack. There’s no GPU fabric here to design. The closest you’ll come is making sure the two GPU servers your data scientist begged for don’t get stuck on the wrong side of a 1G uplink.

You aren’t building scale-across. This fabric exists because Meta and friends ran out of power in single buildings and started splitting training jobs across cities, states, and continents. If you have one data center, or one colo cage, or two cages that talk to each other over a perfectly normal MPLS or SD-WAN circuit, congratulations. You have already won this fight by not needing to fight it.

If a vendor is pitching you a scale-across architecture and you have one data center, you are not the customer. You are the warm-up act.

You aren’t building scale-up. Scale-up is fascinating. It’s also living inside a server chassis. You will get this from your server vendor as a feature on a spec sheet, not as a network you design or operate. When the AI server you bought in 2028 ships with Ethernet-based scale-up between its eight GPUs, you’ll be glad it’s not proprietary, but you won’t be cabling it.

The One That Actually Matters

That leaves the front-end. And that one, you might actually need to think about.

Here’s what front-end “AI traffic” actually looks like when it walks into a normal data center. A handful of GPU servers running local inference for an internal app. A vector database sitting next to a SQL Server, both feeding a chatbot somebody in marketing built. Backups and storage replication that suddenly have to compete with model-loading bursts every time a user asks a question. The occasional 10G or 25G server NIC where there used to be a 1G one.

It’s east-west traffic. Between application servers, an inference endpoint or two, and storage. It looks like the data center network you already run, because it is the data center network you already run.

The “AI tax” on this network is mostly about three things. Predictable latency between the app tier and the inference endpoint. Enough bandwidth for model loading from storage so the GPUs aren’t starving, and not getting weird about QoS for inference traffic when the rest of your environment isn’t tuned for it.

You don’t need a new architecture. You might need to look at your existing one with fresh eyes though. There’s a difference, and a lot of vendor pitches in 2026 are going to try to blur it.

What’s Actually Worth Taking Away From This Conversation

The hyperscaler conversation isn’t useless to the rest of us. There are pieces of it that will trickle down into normal enterprise gear in the next two to five years, and being aware of them now is worth something.

Optics power matters more than it used to. Arista made it clear that a large chunk of AI networking’s power budget is in the optics. When your colo starts billing your power consumption directly (and many already do), this is going to leak into how you spec switches. It’s worth tracking even if you’re not buying liquid-cooled optics any time soon.

Streaming telemetry is real, and it’s coming. The shift from SNMP polling to state streaming is happening underneath all of this. Arista’s CloudVision pitch is built on it. Cisco, Juniper, and others are moving in the same direction. When your next refresh cycle comes up, this is a question worth asking your vendor about, because the operational difference between polling and streaming when something breaks at 2 am is genuinely significant.

Two tiers as the norm got validated. Half of Arista’s pitch is helping their hyperscaler customers stay in a two-tier leaf-spine instead of being forced to three. They’re framing it as innovation. Most of you never had three tiers in the first place. The AI-driven engineering pressure to keep networks flat is just confirming what midsized IT has been doing forever. If you’re three tiers and not entirely sure why, the AI conversation gives you cover to revisit it.

Single-image operating systems matter more than vendors sell them for. Arista’s “EOS runs on every box” story is one of the genuinely differentiated parts of their portfolio, and it matters more for small ops teams than for hyperscalers. When you have three network engineers and a backlog, having one OS to learn, one config syntax to remember, and one troubleshooting muscle memory is worth real money.

Final Thoughts

Most “AI networking” pitches you’re going to sit through this year aren’t aimed at you. That’s not a problem. That’s just where the industry is. The hyperscalers are buying every GPU, every optic, and every switch the supply chain can produce, and the marketing follows the money.

The right frame for the conversation isn’t “Are we AI-ready?” It’s “What do my actual workloads need?” If the answer involves a couple of GPU servers, an inference endpoint, and a vector DB, your existing east-west fabric probably handles it. If it doesn’t, you have a sizing or QoS problem, not an architecture problem.

And the next time a vendor puts up a slide with four boxes labeled scale-out, scale-across, scale-up, and front-end, you can confidently point at the front-end box and say I have one of those. So does everyone else in this room. Then ask them what their pitch looks like for the other 95 percent of customers who don’t need the other three.

The good news about AI networking is that for most of us, we already have one. We just call it “the network.”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.