AIOps Fatigue Is Real, And It's Your Vendor's Fault, Not Yours -

Mike Hoffman has been doing this longer than most of us. His first network troubleshooting tool was an oscilloscope on thick coax, shooting down the cable and picking up a 50 ohm resistor looking for the little bleeps from vampire taps. His next tool was the first-generation Network General LAN Doctor. A briefcase with an RS-232 cable where he read binary blink patterns to figure out what was going on.

That was decades ago. And here’s the thing that should bother all of us. The workflow hasn’t really changed.

Oh sure, the tools got prettier. We moved from blinking lights to dashboards with dark mode, and AI badges slapped on the marketing page. But the core loop is the same. Something breaks. Customers complain. It gets escalated. Someone pulls a senior engineer into a war room. That engineer bounces between six different tools, tears down packets, and eventually, through a combination of experience and educated guessing, figures out what went wrong.

Mike said this during NetAI’s presentation at Network Field Day, and it landed because everyone in the room knew he was right. AIOps was supposed to fix this. For a lot of us, it hasn’t. And if you bought in 18 months ago and you’re still doing the same chair-swivel troubleshooting you were doing before, I want you to hear this clearly: it’s not because you did it wrong.

The Promise vs. The Reality

You know the pitch. Single pane of glass. No more bouncing between point tools. Automated root cause analysis. Your NOC engineers will be more productive. Your MTTR will plummet. You’ll finally get ahead of problems instead of reacting to them.

A lot of shops bought that pitch. And now, a year or 18 months later, they’re still waiting for the payoff.

Network World ran a piece in mid-April covering the 2026 IDC AI in Networking Special Report, and the findings are grim. Organizations that were at “selective use” of AI for networking 18 months ago are still at selective use. The ones at “substantial use” are still at substantial use. Nobody moved. The industry spent a year and a half buying, deploying, and configuring AIOps platforms, and the needle barely budged.

NetAI’s co-founder and CEO, Dr. Deepak Kakadia, put it more bluntly during the Network Field Day session. He pointed out that even the vendors themselves, in their own demo videos on YouTube, will say something like “if the agents come up with two different answers, it’s up to the network engineer to figure out the right path.” Read that again. The tool gives you two possible answers and then shrugs. That’s not automation. That’s a fancy search bar with a confidence problem.

If your AIOps investment still requires a senior engineer to validate every answer, you didn’t buy automation. You bought a copilot for people who were already good at the job, or worse you bought Otto Pilot from Airplane for someone that’s not good at the job.

Why Most AIOps Tools Are Guessing (And Why That Matters)

Here’s where it gets interesting, and where NetAI’s argument starts to make a lot of sense even if you’re not in the market for their product specifically.

Most AIOps platforms today are built on large language models. LLMs. The same foundational technology behind ChatGPT and its cousins. And LLMs are genuinely incredible at language. They can summarize, translate, generate text, and answer questions about documents. But here’s the problem. Networks aren’t documents. Networks aren’t language. Networks are graphs.

Deepak made this point repeatedly, and it stuck with me. LLMs were invented to model human behavior. What will someone buy? What will they click on? What do they mean when they type this sentence? You can never perfectly model a human because humans are unpredictable. But networks? Networks were built by humans. They follow specifications. They have known protocol behaviors, defined neighbor relationships, and predictable failure modes. You don’t need a probabilistic language model to understand them. You need something that understands structure.

When you force network events into an LLM, you’re converting alarms and telemetry into text, embedding that text into vector space, and then trying to find the closest match. You’re adding multiple layers of abstraction between what actually happened in the network and what the model thinks happened. Every layer introduces the probability of error. And at the end of that chain, the best you get is a guess. A really well-informed guess, sure. But still a guess.

Deepak’s experience at Sun Microsystems tells the rest of the story. He was on the fly-and-fix team. When local engineers couldn’t solve a network problem, he got the call. He’d show up to a war room where everyone had a piece of the puzzle, but nobody knew what they were looking at. Everybody guessing. LLM-based AIOps doesn’t fix that dynamic. It just gives the war room a chatbot to argue with.

What “Deterministic” Actually Means For Your NOC

This is where NetAI enters the picture, and I want to be clear that I’m presenting them as an example of a different approach, not writing a product review.

NetAI doesn’t use LLMs for root cause analysis. They use graph neural networks, or GNNs. The short version is this: instead of converting network events into text and hoping the language model finds a pattern, a GNN models the actual structural relationships between devices, interfaces, and protocol layers. It knows the topology. It knows the neighbor relationships at every layer, from physical links up through OSPF adjacencies, BGP peers, MPLS LSPs, and overlay tunnels. When an alarm fires, the GNN doesn’t have to be asked the right question. It already knows what’s connected to what and can trace the causal chain.

During the live demo at Network Field Day, Irfan Lateef (their sales engineer) pulled up a simulated tier-one service provider topology. He logged into a router in Los Angeles, shut down an Ethernet interface to a New York router, and let it rip. Within minutes, the system ingested about 14 alarms across link state changes, OSPF adjacency flaps, and LSA updates. It correlated all of them back to a single root cause: the interface shutdown on the LA router. It showed the evidence timeline, the causal chain of how the failure propagated, the blast radius of affected devices, and then it triggered an automated remediation script that logged into the router and re-enabled the interface.

No prompting. No “let me ask a follow-up question.” No war room. Just: here’s the root cause, here’s how we know, here’s the fix, and here’s the script to do it.

The practical implication for your NOC is that the vast majority of alarm volume is correlated noise, not root causes. In their demo environment, 12,000 alarms condensed down to about 2,500 actual root causes. The other 9,500 were downstream symptoms. If your team is spending time triaging those correlated alarms instead of fixing root causes, you’re paying senior engineer salaries for ticket triage.

The Questions You Should Be Asking Before You Buy Again

Whether or not NetAI is the right fit for your environment, the underlying argument here changes how you should evaluate any AIOps platform. If you got burned on round one, or if you’re evaluating for the first time, here are the questions I’d be asking every vendor:

“When you say root cause, do you mean deterministic or probabilistic?” If they can’t answer this clearly, or if they dance around it, that tells you everything. There’s a massive difference between “we identified the root cause” and “we think these three things might be the root cause, go check.”

“Does your system require my engineer to validate the answer, or does it show its causal reasoning?” This is the difference between a copilot and an automation layer. If the tool can show you the causal chain, the evidence timeline, and the blast radius, your junior NOC engineer can act on it. If the tool just gives a suggestion and says “good luck,” you still need the senior engineer in the loop.

“What happens during a burst of 500 alarms in a 60-second window?” This is where the architecture matters. A GNN-based system handles bursts natively because it understands the structural relationships and can correlate spatially and temporally at the same time. An LLM-based system needs sequential prompting. When your network is melting down is exactly when you need answers fastest, and that’s exactly when the architectural difference shows up.

These aren’t NetAI-specific questions. They’re vendor-agnostic questions that separate real root cause analysis from dressed-up correlation engines.

Final Thoughts

The workflow hasn’t changed in decades. Slapping an LLM on top of your existing observability stack was never going to change that, because the problem was never “we don’t have enough data” or “we need a chatbot.” The problem was that nobody could tell you, definitively, what the root cause was.

The real question for our industry isn’t “should we use AI in the NOC?” Of course, we should. The question is whether the AI actually understands how networks work, the topology, the protocol relationships, the structural dependencies, or whether it’s just pattern-matching on text that happens to be about networking.

If you bought AIOps and feel burned, you’re not alone. The IDC data confirms it. The fatigue is real, and it’s justified. But the answer isn’t giving up on AI in network operations. The answer is demanding that the next tool you evaluate can show its work. Deterministic reasoning, not confident-sounding guesses. Causal chains, not probability scores. And if a vendor can’t explain the difference, that’s your answer right there.

I attended Network Field Day 40 as a delegate. Travel and various expenses were covered by the event. I was not compensated for this post and was under no obligation to write about any vendor. For more information about Network Field Day, visit TechFieldDay.com.

3 thoughts on “AIOps Fatigue Is Real, And It’s Your Vendor’s Fault, Not Yours”

Scott Williams says:

April 21, 2026 at 2:54 pm

Ben,

Great summary of some of the new tools out there for Network monitoring and diagnostics. Being out of the field since 2018, all this AI stuff is new, but you made it understandable. From my point of view as just a Internet user, I use Chatgpt way more than Google searching for answers. The first thing I did when jumping into a network position was manually map out my entire topology, then monitor and amass the knowledge of “what is nominal” I would think any AI tool should also follow these steps.

Pingback: AIOps Fatigue Is Real, And It's Your Vendor's Fault, Not Yours - Tech Field Day
Deepak Kakadia says:

April 27, 2026 at 2:47 pm

Very Helpful and insightful article Ben !

AIOps Fatigue Is Real, And It’s Your Vendor’s Fault, Not Yours

The Promise vs. The Reality

Why Most AIOps Tools Are Guessing (And Why That Matters)

What “Deterministic” Actually Means For Your NOC

The Questions You Should Be Asking Before You Buy Again

Final Thoughts

Related

3 thoughts on “AIOps Fatigue Is Real, And It’s Your Vendor’s Fault, Not Yours”

Leave a Reply Cancel reply

AIOps Fatigue Is Real, And It’s Your Vendor’s Fault, Not Yours

The Promise vs. The Reality

Why Most AIOps Tools Are Guessing (And Why That Matters)

What “Deterministic” Actually Means For Your NOC

The Questions You Should Be Asking Before You Buy Again

Final Thoughts

Share this:

Related

3 thoughts on “AIOps Fatigue Is Real, And It’s Your Vendor’s Fault, Not Yours”

Leave a Reply Cancel reply

Related Posts

Packet Fragment: Windows Blocking an SSID

Cisco Certification – Rebooted

CCNA CyberOps SECOPS – Objective 5.3