Why Your AI Agent Gets Dumber Mid-Session

That Familiar Feeling When Your AI Agent Starts to Drift

There is a specific and deeply frustrating way an AI coding agent goes bad. It does not crash. It does not throw an error message. It just gets duller — quietly, gradually, and without any obvious warning sign. Halfway through a long working session, it forgets a constraint you established at the very beginning. It asks you something you already answered. It starts giving shorter, vaguer replies to the same kinds of requests it handled confidently an hour earlier. The quality sags, and you can feel it, even though nothing has technically broken.

If you have experienced this, you are not imagining it. And if your first instinct was to blame your connected MCP servers, you are in good company. That was exactly where the blame-finger pointed in a real debugging session that turned into something more instructive than expected.

The MCP Assumption Feels Obvious — Until You Measure

Model Context Protocol, or MCP, has become a popular way to extend AI agents with external tools and data sources. Connect a few servers and your agent can search the web, query databases, interact with APIs, and more. But with that power comes a reasonable concern: every connected MCP server loads tool definitions into the context window, and context windows are not infinite. When an agent starts degrading, the mental model practically writes itself. Too many tools loaded, not enough room left to think, obviously it is drifting.

That was the working hypothesis before anyone checked the actual numbers. The plan was to start disconnecting MCP servers one by one to free up space. Then someone made a better decision first: measure before cutting.

The measurement did not say what was expected.

What Is Actually Filling Your Context Window

Many modern AI agents can produce a breakdown of how context window tokens are currently allocated, sorted by category. Instead of guessing, it is worth looking at where the tokens are actually going. The proportions matter more than the raw numbers, since absolute figures will vary by model and window size, but the shape of the problem transfers across setups.

In a session that had clearly started drifting, the breakdown looked roughly like this:

Conversation history — the accumulated back-and-forth of the entire session — was the single largest consumer, accounting for around a fifth of the entire context window on its own, and growing with every exchange.
Fixed startup overhead — including the system prompt, tool framework configuration, and any loaded memory files — represented a meaningful portion, but it was stable and one-time. It was not the variable causing the drift.
Connected MCP tool definitions — the thing that had been blamed without evidence — occupied a relatively small slice. Smaller, in fact, than the margin of error being worried about.

The culprit sitting at the bottom of the suspect list turned out to be the plain, unmanaged accumulation of conversation history. The thing nobody was watching was the thing eating the window.

Why the MCP Assumption Was Half Right

To be precise about this: MCP servers are not free. Every connected server loads its tool definitions into the context at session start, and if you stack up a large number of complex tools, those definitions do add up. The concern is not baseless. However, the key distinction is that MCP overhead is largely fixed and front-loaded. It costs tokens at the beginning and then stays roughly constant throughout the session.

Conversation history, by contrast, is cumulative and compounding. Every turn you take adds more tokens. Every answer, every code snippet, every clarification, every re-prompt — all of it accumulates and keeps accumulating. In a long working session, the conversation history can dwarf everything else in the context window, including every connected MCP server combined.

This distinction matters enormously for how you diagnose and address the problem. If MCP tool definitions were the main culprit, disconnecting servers would be the right move. But if conversation history is the real driver, disconnecting servers will barely make a dent. You need a different intervention entirely.

Practical Steps to Manage Context Window Exhaustion

Once you understand where the tokens are actually going, the solutions become clearer and more targeted.

Start a new session intentionally. Rather than waiting for quality to degrade, establish checkpoints in long working sessions where you summarize progress and start fresh. Many agents support memory files or summary handoffs specifically for this purpose.
Compact your conversation history. Some agents offer explicit commands to summarize or compact earlier conversation turns, replacing verbose back-and-forth with a condensed summary that preserves key decisions without consuming as many tokens.
Be selective about what you paste in. Long code blocks, large file contents, and verbose error logs all count against your context budget. Paste only what the agent needs to see right now, not everything that might theoretically be useful.
Audit MCP connections realistically. Do disconnect tools you are genuinely not using in a given session — not because the savings are enormous, but because good hygiene matters and every token helps at the margins.
Use your agent's context breakdown tools. If your agent can show you a token usage breakdown by category, look at it before you start guessing. The actual distribution is almost always more instructive than intuition.

Measure First, Blame Second

The broader lesson here extends well beyond AI agents and context windows. When a complex system starts behaving unexpectedly, the instinct to reach for a familiar scapegoat — the newest component, the most exotic-sounding feature, the thing you read a warning about last week — is almost always faster than actually measuring what is happening. And it is almost always less accurate.

MCP servers were new, they had a plausible mechanism for causing problems, and they made a satisfying story. But the data told a different story. Conversation history, the oldest and most boring part of any chat-based agent interaction, was quietly consuming the window one turn at a time.

If your AI coding agent is getting dumber mid-session, do not start disconnecting things at random. Pull up the context breakdown, read the actual numbers, and let the data tell you where to look. The real culprit is probably hiding in plain sight, right there in the conversation you have both been having all along.