Why Too Many MCP Servers Make Your AI Worse (The MCP Context Tax)

MCP made it easy to connect your AI to everything — GitHub, Slack, Jira, Notion, Gmail, all at once

Stas Morawski

CTO & Co-founder, Unabyss

24 March 2026 · 4 min read

MCP made it easy to connect your AI to everything — GitHub, Slack, Jira, Notion, Gmail, all at once. Then people noticed their agents getting slower, dumber, and more expensive the more they connected. There's a name for what's happening: the context tax. Here's what it is and how to stop paying so much of it.

Why do too many MCP servers degrade your AI?

Because in many current MCP clients, each connected server's tool definitions are made available to the model up front — before it sees your message. This isn't inherent to MCP itself; it depends on client behavior, server design, the number of tools, schema verbosity, and caching. Those definitions (tool names, descriptions, parameter schemas) cost tokens, and in common setups they add up fast.

The numbers are stark. A single large connector like the GitHub MCP server can load tens of thousands of tokens on its own. In one documented case (not universal MCP behavior), connecting just three servers — GitHub, Slack, and Sentry, around 40 tools total — consumed roughly 143,000 of a 200,000-token context window: about 72% spent before a single user query. Each tool definition typically runs 250–1,400 tokens, and you're paying for all of them, every request.

That's the context tax: the standing overhead you pay just to have tools available, whether or not you use them.

What is the context tax?

The token cost of tool definitions that sits in your context window regardless of what you actually do. Unlike conversation history, which grows as you talk, tool overhead is a fixed cost paid up front each session — and it never goes away while those servers are connected.

It hurts in three ways at once:

Less room for work. Tokens spent on tool definitions are tokens unavailable for your documents, your conversation, and the model's reasoning. Burn 70% on tools and the model has a fraction of its window left for the actual task.
Worse reasoning. Research on "lost in the middle" shows models attend less reliably to information buried in a crowded context. A window stuffed with tool schemas crowds out the signal — the model gets measurably worse at selecting tools and at the task.
Higher cost and latency. Input tokens are billed and take time to process. Tens of thousands of overhead tokens on every request compounds into real money and slower responses.

How do tool definitions eat the window?

Because MCP servers tend to mirror an entire API surface — exposing many granular tools, each with a verbose schema. A server might offer 40, 90, even hundreds of tools, and the model has to be told about all of them up front so it knows what's available. That's the design encouraging the bloat: more tools, more schemas, more fixed overhead.

And it's often repeated each turn. Even with caching, many setups reintroduce a good chunk of the tool payload as the conversation proceeds. So the tax isn't only paid once at startup — it can recur, quietly draining budget throughout the session. The more comprehensive your connected toolset, the heavier every single interaction becomes.

How do you reduce the context tax?

The fix is precision, not abstinence. Concrete tactics that help:

Audit your tools. Count how many tools each server exposes and how often you actually use them. Most setups carry dead weight — connected servers whose tools you rarely call.
Connect fewer servers per context. Load only the servers a given task needs instead of wiring everything into one agent. Many clients let you enable tools selectively.
Prefer task-level tools over sprawling API surfaces. Servers that expose a few composable, high-level tools cost far less than ones mirroring every endpoint.
Don't pipe in raw data sources when you need structured context. This is the one most people miss.

That last point connects to a deeper mistake. A lot of MCP bloat comes from wiring up raw data connectors — Gmail, Drive, a database — hoping the AI will assemble an understanding of your situation from them. But that floods the window with tool schemas and raw retrieval, and still doesn't give the model a clean picture of you (it just gives it more to search). If what you actually want is for the AI to understand your context, delivering that as structured context through a single context layer is far lighter than connecting a dozen data sources and making the model reconstruct it. One server serving organized context, instead of twenty servers dumping schemas and raw data into the window.

To be clear, a context layer doesn't replace tool optimization — if you genuinely need 90 GitHub actions, you still have to manage those schemas. But for the common case of "I just want my AI to know my situation," the answer isn't more connectors. It's less raw plumbing and more structured context.

→ How structured context is delivered: How to Deliver Personal Context to AI Tools

→ The connector/MCP/context-layer distinction: Connectors vs MCP vs a Context Layer

→ Serve structured context through one MCP endpoint with Unabyss →