When Not to Build an MCP Server

The momentum in the MCP ecosystem in mid-2026 is every backend should be wrapped as an MCP server. Conference talks open with "what if your agent could call your existing API?". Vendors pitch "MCP-enable your platform". Internal teams are being asked to expose their services as tools so the company's central agent can orchestrate workflows across them.

Most of this is healthy. MCP is a real standard, the pattern is genuinely useful, and the protocol's adoption is solving an interop problem the AI tooling space had been failing at for two years. But "useful for many things" is not the same as "useful for everything", and the cases where MCP is the wrong answer are real, identifiable, and worth being explicit about.

This post is the contrarian one. It is the post that exists in this series because we have seen teams burn months building MCP servers around capabilities that should not have had agents in the loop at all. The cost was not just engineering time — it was features that performed worse than the version they replaced, customer-facing incidents that would not have happened with the old direct integration, and a hangover of architectural complexity that the team is now stuck with.

The closest neighbor in the series is the transport post. That post talks teams out of HTTP when stdio is the right fit. This post talks teams out of MCP altogether when neither transport is the right fit.

The general rule

Before the specific cases, the heuristic to keep in mind:

MCP is a useful pattern when the value of having an LLM in the decision loop exceeds the cost of having an LLM in the decision loop.

The cost is real. Latency from model inference. Variance from non-determinism. Tokens from prompts and tool descriptions. Failure modes from misuse. Engineering investment in the server and its operational shell. None of these are zero. And there are workloads where they are clearly larger than the value the agent adds, in which case the right answer is not to add the agent — to keep the existing direct integration, the existing UI, the existing automation.

The cases below are the ones where this calculation reliably tilts toward "do not".

Latency-critical paths

Any operation where the user-perceived latency budget is tighter than the agent's per-turn latency.

The numbers, roughly, in mid-2026:

Frontier-cloud LLM, full agent loop with one tool call: 2-5 seconds per turn.
Smaller cloud LLM (Haiku-class): 1-2 seconds per turn.
On-device 3B model (from the Flutter capstone post): 0.5-2 seconds, depending on hardware.

If your operation needs to complete in under 500ms, the agent is not the right caller. Examples:

Autocomplete suggestions in a text field. The user is typing. The window is 100-200ms. An agent in this loop is a non-starter.
Real-time game state updates. A multiplayer game with a 60ms tick rate has no room for an agent.
Programmatic API calls from a tight loop. A batch job processing a million records cannot afford to wait two seconds per record for an agent's blessing.
Order-matching engines, payment authorization, anything in the hot path of a transaction. Latency is the product; agents are not in the picture.

The pattern: if the human who would normally trigger this operation is not the one waiting for the response, the agent should not be either. The agent's value is in its judgment, and judgment costs latency. Operations that need to happen faster than judgment can decide should be wired direct.

Where MCP can still play: adjacent tooling around these systems. An MCP tool that configures the matching engine, queries its state, runs reports against its history is fine. An MCP tool in the matching loop is not.

Operations that legally require a human

A class that comes up more than people expect, especially in regulated industries.

Examples:

Medical diagnosis or prescription decisions. Many jurisdictions require a licensed clinician to make these calls personally. An LLM-mediated path is, depending on the jurisdiction, either explicitly prohibited or in legal gray area sufficient that "do not" is the right default.
Final approval on financial decisions over a threshold. A loan approval, a wire transfer above a threshold, a fraud-related account decision. Regulators in most jurisdictions require explicit human attestation.
Hiring and termination decisions. Some jurisdictions have laws about automated employment decisions; many companies have internal policies that require human sign-off independent of law.
Legal advice. Bar associations in most jurisdictions are explicit that giving legal advice requires being a licensed attorney; an AI agent doing this through tools is at minimum a gray area.

The technical pattern: the workflow can have an MCP server that prepares the decision — gathers information, drafts the rationale, surfaces the relevant precedents. The MCP server cannot have a tool that executes the decision. The execution is wired to a separate UI that requires the human's authenticated action.

This often means the MCP server's tools end with prepareDraftFor... or proposeDecision, never executeDecision. The handoff to the human is at the boundary the law (or the policy) requires it.

The trap: a developer builds the prepare-and-propose tools, finds the handoff inconvenient, adds an "execute" tool to the MCP server "for testing", forgets to remove it, and ships a system that bypasses the legal requirement. Architectures that lean on developer discipline to enforce regulatory boundaries are the ones that fail audits. The boundary needs to be at the system level, not the convention level.

Workloads where model variance is unacceptable

The category most teams underestimate.

LLMs are stochastic. The same prompt can produce different outputs on different runs. For most tool-calling workloads this variance is acceptable — minor wording differences in a search query, different orderings of paragraphs in a summary, occasional rephrasings that the user does not care about.

For some workloads, variance is unacceptable, and the system should not have an agent in it.

Examples:

Idempotency-critical operations. "Charge this customer $50" is a tool call where the agent calling it twice (because the first call seemed to fail, or because of a retry, or because of any of the many quiet-failure patterns from the evaluation post) charges the customer twice. The fix at the MCP layer is idempotency tokens, but those are easy to mishandle when the agent is the one synthesizing them. Wire idempotent operations through systems that handle idempotency at the integration layer, not the agent layer.
Operations whose output is concatenated into a larger document with strict formatting requirements. If the output of formatBusinessAddress has to match a regex precisely, and 0.5% of the time the model returns a slightly malformed version, your downstream system breaks 0.5% of the time. Use a deterministic formatter.
Audit-critical chains. Operations whose outputs feed into financial reporting, regulatory filings, scientific datasets. The variance compounds across calls, and reproducibility is part of the product. Determinism is required; LLMs are the wrong substrate.
Cryptographic primitives, transaction-signing, identity-attesting operations. These need to be exactly correct, every time, derived from inputs deterministically. Do not put an agent in front of them.

A useful test: if I ran this operation twice with the same input and got slightly different outputs, would my system be wrong? If yes, the agent is not in the loop.

Operations the user already does in two clicks

A subset of the latency case, framed differently.

If the workflow is: "user clicks a button, the system does the thing in two seconds, the user sees the result" — that workflow does not benefit from MCP. Adding the agent layer adds:

A round trip through the model.
A round trip through the host.
A new failure mode if the agent picks the wrong tool or sends the wrong args.
A new training surface for users who now have to figure out what to type to make the agent click the button.

The button is faster, more reliable, and more discoverable than the agent. The agent's value is in cases where the user does not know how to articulate the workflow as a sequence of clicks, or where the workflow spans multiple systems that the buttons do not cross. If the user already has the button, leave the button alone.

The instinct to MCP-enable is strongest at companies whose product surface is mature. The mature product is also the one where MCP adds the least, because the product has already optimized the common workflows. New users want the button. The agent helps with the long tail of the rare workflows. Build MCP for the rare workflows; leave the common ones to the existing UI.

Cases where the data is not where the agent thinks it is

A subtler failure mode. The agent assumes its tool calls return reliable, complete, current information. If the underlying system has caching layers, eventual consistency, sharding artifacts, or any other property that means the tool's output is approximate, the agent will reason from the approximation as if it were truth.

Concrete examples:

Read-replica latency. A tool that queries a read replica may return stale data; the agent does not know it is stale and acts as if it is current. For some workflows that is fine; for "is this customer still subscribed?" it is a bug.
Eventually consistent stores. A getOrderStatus tool against a service whose order status replicates with a one-minute lag will sometimes report a status that has just changed. The agent does not see the lag; the user does.
Pagination with hidden cutoffs. A searchDocuments tool that returns the top N matches, where N is an undisclosed limit. The agent will treat the result as the complete set; the user will get answers that are missing data.

For these workloads, the right move is not to put MCP in front, but to fix the underlying data semantics first. The MCP layer cannot rescue an unreliable backend; it inherits all the unreliability and adds variance on top. Direct UIs at least let the user see "showing 100 of an unknown total"; the agent, by default, hides those caveats.

When the agent's reasoning is not the bottleneck

A test that has saved teams from over-engineering: what does the agent actually add to this workflow that a script could not?

If the answer is "it lets the user phrase the request in natural language" — that is real value, but it is small. A natural-language frontend over a fixed workflow is sometimes worth the engineering, sometimes not.

If the answer is "it picks the right tool from a catalog of options based on context" — that is meaningful value, the kind MCP was built for.

If the answer is "I am not sure, but it seems modern" — that is the case where MCP is being built for the wrong reason. The team is shipping the agent because agents are in fashion, not because the workflow benefits.

The teams that consistently ship MCP servers that get used are the ones that can answer the question crisply: "the agent helps because [specific reasoning the user could not have done themselves quickly]". When the answer is generic, the artifact tends to be generic, and the user finds the original UI more reliable. The MCP server then sits unused, with all the operational cost of a real product and none of the value.

What to do instead, in each case

For completeness, the alternative path for each rejected case:

Latency-critical: keep the direct integration. Consider giving the agent a read-only adjacency tool (one that queries state without affecting the hot path) so the agent can reason about the system without being in it.
Human-required: build the prepare-and-propose tools as MCP, build the execute path as a separate authenticated UI. The agent prepares; the human commits.
Variance-intolerant: deterministic formatters, idempotent integrations, signed-and-verified operations. The agent can call into them after the deterministic logic has produced its result, but it does not produce the result.
Two-click workflows: leave them alone. Add MCP for the multi-system, multi-step, judgment-required workflows the buttons do not cover.
Unreliable data backends: fix the data layer first. The MCP server is the easy part; without a reliable backend, the agent's confident answers will compound the existing problems.

The thread through all of these: MCP works best when the agent's judgment is the bottleneck and the rest of the system is reliable, deterministic-where-it-needs-to-be, and well-bounded. When those conditions are not met, building MCP is paying for the agent's complexity without buying the agent's value.

A small list of things MCP is genuinely good at

For balance — the cases where MCP earns its keep, distilled:

Cross-system orchestration that requires judgment. "Plan a marketing campaign that pulls customer segments from CRM, drafts copy in the style guide, and schedules emails through the marketing platform." Three systems, one workflow, judgment at every step. Agent territory.
Long-tail support automation. "I cannot find the export feature for type-X reports filtered by date range." A direct UI is hard to make discoverable for every variant of every question; an agent that can reach through MCP to the right system can be.
Knowledge work over heterogeneous corpora. The RAG-MCP post covers this. The agent's job is to select the right retrieval strategy, refine, synthesize. The MCP server's job is to expose the corpora; the agent does the work direct UI cannot.
Adaptive, exploratory workflows. Debugging a failing test (per the Flutter MCP host post). Investigating an incident from observability data. Composing a fix for a bug. The path is different every time; an agent can adapt; an agent over MCP tools is the right shape.

These are the cases worth investing the engineering for. The honest framing for everything else: if the workflow is fast, deterministic, regulated, two-clicks-already, or running over unreliable data, MCP is the wrong tool. Use the right tool, ship the working thing, and save MCP for where it earns the cost.

Where this fits

This post sits at the end of the operational arc — after the pillar, the security post, the transport post, and the versioning, observability, and evaluation posts. By the time a team has read those, they have a sense of how much real engineering an MCP server takes. This post is the prompt to ask "given that cost, is this the right answer here?"

The bounded-context post is the right next read for a team whose answer is "yes, but more carefully" — the architectural discipline that keeps the engineering investment proportionate to the value.

The most useful conversation about MCP at most companies in 2026 is the one that ends with "we are not going to build that". Not because MCP is bad, but because the workflow does not need it. The teams that ship great MCP servers are the ones that pick their MCP servers carefully — only the workflows where the agent's judgment is the bottleneck, only the cases where the cost is justified, only the systems that can carry the operational weight. Saying no to the rest is the discipline that lets the yeses succeed.