A few days ago, I built a system that could figure out which AI agent caused problems when teams failed at tasks. It worked okay - about as well as the academic research on the same problem. But something kept bothering me about the whole approach.
We were spending all this effort figuring out who to blame after things went wrong. What if we could prevent the problems instead?
I had a simple thought: what if the agents who talk the most are the ones breaking things? And what if removing them actually makes teams work better?
testing a counter-intuitive idea
Most people building AI systems think the opposite. When something goes wrong, they add more checking. More verification agents. More oversight. More safety measures.
But I wondered if all that extra chatter was making things worse, not better.
I went back to the same dataset I'd used before - 126 real conversations where AI agent teams failed at tasks. Instead of just finding the troublemaker, I started looking at who talked the most.
The pattern was immediate. In conversation after conversation, the agent who caused the failure was also the one who talked the most.
the simple test
I ran a straightforward experiment. For each failed conversation, I asked: what if we removed the chattiest agent? Would the team have done better?
Using basic rules to predict this, I found that removing the chattiest agent would help teams succeed 48% of the time.
That number surprised me. My original complex system for finding troublemakers only worked 49% of the time. A simple rule - "remove whoever talks the most" - worked almost as well.
But I needed to see this in action, not just in theory.
watching it work
I took one failed conversation and rebuilt the scenario. The original team had four agents: Excel_Expert, Computer_terminal, BusinessLogic_Expert, and DataVerification_Expert. They'd failed at a data analysis task.
Excel_Expert had been the chattiest agent in the original failure. So I removed them and gave the same task to the remaining three agents.
The difference was striking. Without the chatty agent, the conversation was focused:
Computer_terminal: "Execute the provided code block to analyze the spreadsheet."
BusinessLogic_Expert: "Run the code to load the spreadsheet, extract addresses, and count the clients."
DataVerification_Expert: "The code looks correct. Make sure the column names match."
Three clear responses. No endless back-and-forth. No confusion. Just agents working together to solve the problem.
the surprising pattern
Looking across all the data, I found that verification agents - the ones whose job is catching mistakes - cause the most problems.
The numbers were clear:
Tool agents: 0.29 failures per agent
Analyst agents: 0.56 failures per agent
Coordinator agents: 0.71 failures per agent
The agents supposed to make things safer were making things worse. And the simple workers who just did their jobs quietly almost never caused failures.
why this happens
Think about any group project you've been part of. Usually, the person who talks the most and tries to coordinate everyone else creates the most confusion. The person who quietly does their assigned work rarely causes problems.
The same thing happens with AI agents. The ones trying to verify and coordinate everyone else's work end up making more mistakes than the workers they're supposed to be checking.
what this means
Most companies are building AI systems backwards. They think more oversight means better results. My data shows the opposite.
If you want reliable AI teams, use fewer managers and more workers. Let agents do specific tasks without trying to oversee each other. Simple systems often work better than complex ones.
This goes against everything people believe about building safe AI. But the evidence is clear across 126 real conversations where teams failed.
the limits
I tested this on one specific dataset about AI agent failures. The pattern might not hold for every type of task or every way of building agent teams.
And 48% success at preventing problems, while good, still means this approach fails more than half the time. There's clearly more to understand about why AI teams break down.
But the core insight seems solid: chattiness correlates with problems. Agents who talk too much tend to cause more failures than agents who focus on their work.
what's next
I'm releasing the code so other people can test this on their own data. The approach is simple enough that anyone building AI agent systems can try it.
The bigger question is whether we need to rethink how we design these systems. Maybe the answer to AI safety isn't more checking. Maybe it's less interference.
Sometimes the obvious solution - remove whoever's causing the most noise - turns out to be the right one.
You can find all the code and data at github.com/dineshdm1/glassroot-graph. The core finding replicates across multiple ways of testing it. Teams work better when you remove whoever talks the most.
That's simple enough to implement immediately. And counter-intuitive enough to change how we think about building AI systems that actually work.
Best,
Dinesh DM
