MCP Server Versioning and Tool Deprecation: Without Breaking Every Agent

The first version of mcp-blog-publisher had a tool called publishDraft. The second version had a tool called publishArticle. The change took six lines of code on the server side. The change broke every conversation in flight on the client side, silently, for three days, until someone noticed Claude had started apologizing for being unable to publish articles.

This is the post that did not exist when we needed it. There is plenty of writing on building a first MCP server. There is almost nothing on what happens when you have one in the wild and decide the third tool's name was wrong.

Three layers of versioning are worth being deliberate about, and most teams collapse them into one and get burned. The protocol version (the spec the server implements). The server's own tool surface (names, schemas, descriptions). And the underlying product the tools wrap (your API contract). Each one moves on its own clock, and a clean policy for one does not buy you anything for the other two.

This is a follow-on to the overview. It assumes you have a server in production, or close enough to it that the following thought has started to weigh on you:

"what happens when I change this?"

The three clocks

Protocol version. The MCP spec itself moves. As of mid-2026, 2024-11-05 is the most-deployed version, with newer revisions in flight. The host and server negotiate during initialize. Picking which versions your server claims to support is a real decision, and the failure modes — host on 2025-03, server on 2024-11, partial feature mismatch — are the kind that look like "the tool just stopped working" and take a debugger to identify. The spec moves slowly enough that this is rarely the immediate problem. It is the problem that bites in year two.

Tool surface. The names of your tools, the shape of their input schemas, the wording of their descriptions, the format of their outputs. This moves fast. Every time you tune a tool description because the agent kept misusing it, every time you add an optional parameter, every time you remove a field from the result because the agent was always ignoring it — that is a tool-surface version change. Most of those changes are safe in isolation; some of them are not, and the unsafe ones are not always obvious in advance.

Backend product version. What your tools actually do. The article-publishing pipeline, the database, the third-party API. This is the layer your engineering team has been versioning forever, with whatever practices were in place before MCP entered the picture. The relevant question is: what is the contract between your tool surface and your product API, and how do you keep it stable as both move?

Mix these up and changes propagate badly. A protocol-version change that should have been invisible to tools breaks tool calls because you treated tool descriptions and protocol features as the same surface. A backend change that needed a tool-surface change goes out without one because nobody was watching that boundary. A tool-surface rename that needed a backend deprecation goes out without it because the team thought they were just renaming a thing.

Before any version policy, get the layers separate in your head. Then version each one with the discipline it deserves.

The taxonomy of tool-surface changes

Not every change to a tool is a version change. The cleanest split:

Safe additive changes — almost certainly fine.

Adding a new tool that did not exist before.
Adding a new optional parameter to an existing tool, with a sensible default.
Adding a new field to the output of an existing tool, in a position where additional fields are expected.
Improving a description in a way that does not change the intended use of the tool.

Behavior-shifting changes — fine if you mean them.

Changing a description in a way that changes which calls the agent will make. Most description tweaks fall here. The description is the prompt the model is reasoning over; changing it is a behavior change, even if the parameters and return shape stay the same.
Changing default values for optional parameters.
Tightening validation (rejecting inputs that were previously accepted silently).

Breaking changes — every one of these needs a deprecation path.

Renaming a tool.
Removing a tool.
Renaming a parameter.
Removing a parameter, or making an optional one required.
Changing the type of a parameter or output field.
Changing the meaning of a tool while keeping its name (the worst kind, because it looks like a description tweak and it is not).

The mental model: anything an agent's running session might depend on is breaking. The agent has the previous tool surface in its context window. It will keep emitting calls in the previous shape until it sees the new shape. A new shape that is incompatible with the old one is a broken session.

The "client cache" problem

The hidden gotcha that catches every team once.

When a host connects to an MCP server, it lists the tools and puts them in the model's context. Most hosts cache that listing for the duration of the session. If your server's tool surface changes mid-session — version is shipped, server restarts, the host does not refetch — the model is now reasoning over a stale tool surface. It will emit calls that match the old shape; the server, now on the new shape, will reject them; the host will surface "tool not found" errors back to the agent, which will then apologize and try again with what it thinks is the correct shape, which is also wrong, and the conversation degrades.

Three things help.

Version in the tool description. Embedding (v1.2) in the tool description text is ugly and effective. The model can see what version of the tool it is reasoning over. If you ever need to debug "why is the agent calling this tool with the wrong shape," the version string in the description tells you which generation of the description is in flight.

Refetch on `tools/list_changed` notifications. The protocol supports a notification the server can send when its tool list changes. Hosts that respect it will refetch. Hosts that do not respect it will keep the stale list. Send the notification anyway; it costs nothing and fixes the well-behaved hosts.

Plan deploys around session boundaries when you can. If your traffic pattern includes natural lulls — overnight, weekends, after a known meeting — deploy then. Sessions in flight will pick up the new surface on next reconnect, which for stdio servers is the next time the host spawns the process and for HTTP servers is the next session-establishment.

None of this is a substitute for not making breaking changes, but the cache problem is real and worth designing around.

A worked rename: `publishDraft` → `publishArticle`

The publishDraft → publishArticle rename is real. Here is what it should have looked like, the second time we did it, instead of the first.

Step zero, before any code. Decide whether the rename is worth a breaking change. The first one was; the new name was clearer, the old name was a leftover from a half-baked early design, and the cost of dragging the wrong name forward outweighed the migration cost. Almost no rename is purely cosmetic. If you are convinced one is, hold off until you have something else to bundle it with. Renames in flight are the most common cause of "the agent does the wrong thing and nobody knows why."

Step one. Add the new tool. Keep the old one. Both publishDraft and publishArticle are now registered. They share an underlying handler. The new one has the description you want; the old one has its description updated to a deprecation notice:

javascript

publishDraft (DEPRECATED — use publishArticle instead).
This tool is being phased out. New calls should use publishArticle, which has the
same behavior and an updated name. Calls to publishDraft will continue to work
through 2026-06-01 and will be removed thereafter.

The deprecation notice in the description is the model-facing signal. It will start preferring the new tool on its own, given a few turns. The fact that both tools route to the same handler means that whatever the model picks, the work happens.

Step two. Wait. This is the hardest step for the team and the most important. A real deprecation window is at minimum a few weeks of production traffic; for high-stakes tools, it is months. Both versions exist; tooling exists to count how often each is called; the cutover does not happen until the call count on the deprecated tool is near zero.

Step three. Remove the old tool. Once usage has fallen, remove publishDraft. The new release notes call out the removal. The next time a host that was caching the old surface tries to call it, the call will fail with a clear "tool not found" — at that point the cache is genuinely stale and the failure is informative.

The first time we did this rename, we did step one and step three on the same afternoon. That is the entire story.

Schema-change patterns

Renaming and removing tools are the visible cases. Schema changes inside an existing tool are more frequent and more silently dangerous.

Adding optional parameters. Almost always safe. Old callers ignore the new parameter; new callers set it. Make sure the default behavior is identical to what the tool did before the parameter existed.

Removing parameters. Always breaking. Even if the parameter has been documented as deprecated for months, an agent's cached tool description may still include it. The safe path: keep the parameter accepted (silently ignored) for a deprecation window, with the description noting it is ignored, and only remove it after the same usage-tracking exercise.

Renaming parameters. Treated as removal-plus-addition. Accept both names during the deprecation window. Pick one as the canonical, document the other as deprecated, eventually drop the deprecated one.

Changing types. The most dangerous. count: number becoming count: { value: number, unit: string } is a change the model cannot recover from without seeing the new schema. Always do this as a new tool (or a new parameter on the existing tool) and let the old type fade out.

Output shape changes. The reverse of input changes — the model is reading these, the client is parsing these. Adding fields to outputs is safe; renaming or removing them is not. The same deprecation pattern applies.

A useful question to keep asking: what would happen if an agent's tool description was a week stale and it called this tool with last week's shape? If the answer is "the call works because the parameter is ignored," safe. If the answer is "the call rejects with a validation error," breaking, and it needs the slow deprecation path.

Versioning the server itself

A separate question from versioning the tool surface: how does your server announce its own version, and what should it do across versions?

The MCP spec includes a serverInfo.version field in the initialization response. Set it. Use SemVer. The host receives this and a competent host will log it; a competent operator will trace incidents back to it.

What major/minor/patch mean for an MCP server, applied to the taxonomy above:

Patch — bug fixes that do not change the tool surface. Behavior shifts so small that no agent could reasonably notice.
Minor — additive changes. New tools, new optional parameters, improved descriptions that do not change intended use.
Major — anything in the breaking-changes column. Renames, removals, type changes, behavior-shifting changes that matter.

Treat description rewrites that materially change which calls the agent will make as at least a minor version bump. Even if the diff looks small. The model is reasoning over the description. A description change is a behavior change for the system as a whole.

For the protocol-version axis, your server's protocolVersion is what you negotiated with the host on initialize. That is the host's call. Your responsibility is to refuse if the negotiated version does not include features your handlers depend on, and to handle the older versions you claim to support. Do not claim more than you support. A server that promises 2024-11-05 and then emits structures the spec did not allow until 2025-03 will misbehave in subtle, host-specific ways.

What to put in the README

A version section in the README of an MCP server is a small thing that pays back disproportionately. The shape we land on:

Current server version.
Supported protocol versions.
Tool catalog with per-tool stability status: stable, experimental, deprecated (until: 2026-06-01).
A changelog with the same taxonomy as above — additive, behavior-shifting, breaking — applied to each entry.
A migration guide for the most recent breaking change, if any.

The audience for this is not the model — the model only sees what the host shows it. The audience is the operator. The team running the host. The integrator at the customer side. The future-you, six months from now, debugging a regression and trying to remember what changed.

What to log

Tied to the observability post: every tool call should log, at minimum, the tool name, the server version, the protocol version, and the timestamp. When you ship a new version, those logs are how you tell whether traffic is migrating to the new tools, whether deprecated tools are still being hit, and whether old hosts are still negotiating old protocol versions.

The single most useful piece of telemetry across versioning: a per-tool, per-version call count over time. A graph of publishDraft (deprecated) and publishArticle (new) over the deprecation window tells you when removal is safe. Without it, you are guessing.

Why this is more important for MCP than for regular APIs

Most engineering teams have versioned a REST API at some point. The instincts transfer, but two things about MCP make versioning harder than the equivalent REST exercise.

Your callers are LLMs, not engineers. A regular API has callers you can email — "we are deprecating /v1/articles, please migrate by Q3." An MCP server's callers are agents, mediated by hosts, configured by users you may not even know. There is no migration email. The tool description is the only channel. If the description does not communicate the change, the change does not communicate.

Behavior depends on the description. A REST API's behavior depends on its code. An MCP server's effective behavior depends on its code and its description, because the description controls when and how the agent calls it. Tightening a description to discourage misuse is a behavior change in the system, not a documentation tweak. Treating description edits as "no-op" version changes is the most common mistake we see.

The discipline that comes from those two facts: every version change is also a documentation change is also a model-behavior change. They are not separable. The teams that ship versioned MCP servers well are the ones that internalize that the description is part of the running system.

Where this fits

The overview is the prerequisite. The tool-design post is the closest neighbor — most versioning pain is description pain in disguise. The observability post is the operational counterpart; without per-version telemetry, deprecation timelines are vibes. The security post overlaps where versioning is also a security concern (a deprecated tool that still works is still part of your blast radius).

For teams running MCP servers internally, the bounded-context post is the architectural frame that keeps version churn local — one server's deprecation should not be every server's deprecation.

Renames will happen. Schemas will change. The tool that looked perfect on day one will look wrong on day forty. Plan for that on day zero. The teams that hold a stable contract with their agents are the teams whose agents quietly keep working through every version change, and that quiet is what shipped MCP looks like when it is going right.