The MCP Server as a Product, Not a Script

A pattern repeats across teams new to MCP. Someone writes a server in an afternoon, demos it to the team, ships it to a couple of internal users. It works. Word spreads. Other teams ask for access. A customer mentions it would be useful in their workflow. Three months later, the server has fifty users and is "in production" — a word the original author would never have applied to it on day one.

The transition from "an interesting prototype on someone's laptop" to "a thing people depend on" is rarely deliberate. It happens because the artifact gets used, and the patterns of being used are the patterns of being a product. By the time anyone notices, the prototype has all the responsibilities of a product and none of the affordances. No SemVer story. No deprecation policy. No support contract. No runbook. No status page. Just a Node process on a VPS that everyone is now quietly relying on.

This post is the operational discipline that turns an MCP server from a script into a product. It is not exciting. It is the work that distinguishes systems that survive their first incident from systems that do not. Every item in this post is something you will eventually need; the question is whether you build it before the first need or after.

The closest neighbors in this series are the versioning post (the SemVer half) and the observability post (the telemetry half). This post is about the things around them — the operational shell.

What "product" means here

A definition before the discipline: a product is something a user (internal or external) can build on with confidence. They can predict what it will do, predict when it will change, predict who to call when it does not work, predict what they are entitled to in terms of uptime and support.

A script is something where none of those predictions are reliable. It might work, it might not. It might change tomorrow without warning. The author might be on vacation when it breaks.

Most MCP servers in the wild are scripts being treated as products by their users. The user does not know which they are using. The author does not always know either. The drift from one to the other is the source of most operational pain.

The work below is what closes that gap.

SemVer as the contract surface

The versioning post covered the mechanics; this is about the commitment.

When you publish a version number, you are making a promise. SemVer's promise is structured:

Major version change: I have made breaking changes. You may need to do work to upgrade.
Minor version change: I have added things. You should not need to do anything.
Patch version change: I have fixed bugs. You should pick this up at your earliest convenience.

A product version-bumps consistently with that contract. A script version-bumps when the author feels like it. The user can tell the difference; they will trust the product and not the script, and the difference shows up as "is this thing safe to depend on?".

The discipline:

Every release has a version number, set in serverInfo.version and in your release notes and in your README's "current version" line. All three match.
Major versions trigger deprecation cycles. From the versioning post, a breaking change does not happen in a single release; it happens in a deprecation window with the old behavior still working.
No "rolling main". If your users are pulling from main and the contract is "whatever main happens to do today", you do not have a product, regardless of how mature the code is.

A changelog that is actually a changelog

A README section labeled "What's new in v0.4.2" with bullets like "fixed bug" is not a changelog. It is a placeholder that suggests one used to be here.

A real changelog answers, for every release: what changed, who is affected, what should they do?

Format that has served us well:

markdown

## v0.4.2 (2026-05-06)

### Breaking
- (none)

### Behavior changes (description and prompt updates)
- `publishArticle`: description tightened to clarify that draft names should
  not include the `.md` extension. Agents that were already passing names
  without the extension are unaffected; agents that included the extension
  may now see different results.

### Additive
- New tool `getDraftMetadata({ name })`: returns the front-matter of a draft
  without its body. Useful for previewing without loading the full text.

### Fixes
- `searchKnowledgeBase` no longer crashes when given an empty `query` string;
  now returns a validation error.

### Internal
- Switched logging library from console to `pino`. No user-visible effect.

Five categories. Every entry is actionable or explicitly marked "no action needed". Behavior changes get their own section because they are the most subtle (see the versioning post — description changes are behavior changes for the system).

The audience is the integrator, the operator, the future-you. Write for them.

Deprecation policy as a published thing

A deprecation policy answers: if I depend on tool X today, how much warning will I get before tool X stops working?

The policy lives in the README, prominently. It commits the team to a timeline. Examples that have worked:

Stable tools are deprecated for at least 90 days before removal. During the deprecation window, the tool continues to function with a deprecation notice in its description and a deprecation warning in any tool-list responses.
Experimental tools may change or be removed without warning. Their description must include "(experimental)" so users see the status.
Schema changes follow the rules in the versioning post — new optional parameters are non-breaking, new required parameters require a major version bump and a deprecation cycle.

The reason to publish this is the same reason any contract is worth publishing: it commits the team in writing, it sets the user's expectation, and it makes the conversation about "you broke this without warning" much shorter.

A deprecation policy that is spoken but not written is a policy nobody is going to honor in a crunch.

SLAs, or honest non-SLAs

For a server with internal users, "best effort" is fine, if you say so. For a server with external users, expectations need to be set.

The dimensions worth committing to:

Availability — what fraction of the time is the server expected to be reachable?
Latency — what is the typical and worst-case response time for a tool call?
Support response time — when something is broken, how long until a human looks at it?
Maintenance windows — are there scheduled periods when the server may be unavailable?

For internal-only deployments, these can be aspirational rather than contractual. "We aim for 99% availability during business hours; off-hours are best-effort". That is honest and useful.

For external customers, especially paying ones, these become contractual. The numbers depend on your business. The discipline is to commit to numbers you can actually meet, not numbers the customer wanted to hear. An SLA that promises 99.99% on a server that has run for two months and has no redundancy is a lie waiting to be exposed; an SLA of 99% with credible reasoning is far better than 99.99% you cannot defend.

Status pages

The unsexy artifact that pays off in every incident.

A status page is a single URL the team and the customers can both look at. It says "is the server up right now?" and ideally "what is its history of uptime?". It is updated by the team during incidents.

For internal-only servers, this can be a Slack channel and a Notion page. The discipline is the same: when something is wrong, the team posts an update; when something is resolved, the team posts a resolution; the history is preserved.

For external customers, a real status page (status.yourdomain.com, hosted on StatusPage.io or one of its alternatives, or rolled with a small static-page generator) is non-negotiable above a certain customer count. Customers who cannot tell whether the server is down or whether their integration is broken will assume the worst and open support tickets. A status page absorbs most of those.

The smallest viable status page:

Current status: operational / degraded / down.
Most recent incident: when, what, what is being done.
Historical uptime: a simple chart of the last N days.

That is enough to be a real signal. The polish (subscriptions, regional breakdowns, dependency graphs) comes later.

Runbooks

The thing that lets the second incident be twenty minutes instead of two hours.

A runbook is a structured document for an operational scenario, written in advance, that tells someone what to do. In advance is the key word. Writing a runbook during an incident is a recipe for missing something obvious.

Runbooks worth having for an MCP server:

Server is down. How to confirm. How to check the obvious causes. How to restart. Whom to call if the obvious does not work.
A tool is failing. How to localize to a single tool versus a transport-level issue. How to read the trace for a failing call. How to roll back the most recent release.
A customer reports their agent is behaving wrongly. The triage flow from the observability post — pull the trace, find the tool calls, compare to historical baseline.
An incident has just been declared. Who has the keys, who updates the status page, who talks to customers.
A deploy is rolling out. The pre-deploy checklist, the during-deploy verification, the rollback trigger if metrics drift.

Each runbook is short — half a page, mostly bullets. The format that works: symptom → most-likely cause → first action → if that did not help, escalate to X.

The discipline: a runbook that has not been used recently is a runbook that has bitrotted. Run a quarterly drill where the team picks a random runbook and walks through it on a non-incident day. Update what is wrong; remove what is obsolete.

Support contracts and channels

The last operational layer. Where do users go when something is broken?

Internal servers usually have an informal channel — a Slack room, an email alias, a "@-mention me." That works at five users; it stops working at fifty. The transition signal: when "I have a question about the MCP server" starts feeling like an interruption, the support pattern needs to formalize.

The shape we have landed on for serious internal deployments:

A dedicated channel, named clearly, with the team's commitment to response time pinned at the top.
A categorization system. "Question" / "Bug report" / "Outage." Each one has different SLAs and different handling.
A weekly review of patterns. Recurring questions become FAQ entries; recurring bugs become roadmap items; recurring outages become incident reviews.

For external customers, the formalization extends: ticketing systems, support tiers, escalation paths. The shape depends on the business. The MCP-specific concerns are largely the ones from the observability post — every support interaction should start with the trace ID for the relevant call, and every customer should be able to share their trace ID without exposing other customers' data.

The operational maturity ladder

A useful frame for talking to the team about where the server is on the product-vs-script spectrum. Five rungs:

Script. Runs on someone's machine. Goes down when the laptop sleeps. No version, no logs, no support.
Hosted prototype. Runs on a VPS. Has logs but nobody reads them. No version discipline, no SLA.
Internal product. Has version numbers, a changelog, structured logs, a Slack channel for support, a basic runbook. Treated as a real thing by the team.
External-customer product. Has a public status page, a published deprecation policy, a real SLA, a dedicated support process, runbooks for every common incident.
Critical-path infrastructure. Has redundancy, regional failover, on-call rotations, post-incident review processes, a quarterly deprecation cadence published a year in advance.

Each rung is more work than the one below it. Each one is justified by something specific, not by ambient pressure to "be more professional." Pick the rung that matches the actual usage and commit to its discipline; do not cosplay a higher rung when nobody is forcing you to.

The mistake to avoid: staying on rung 2 while users have started treating you like rung 4. That is the gap that makes incidents painful and customer relationships fragile. When the user expectation has moved up the ladder, the operational shell needs to move with it.

Where this fits

The versioning post and the observability post are the technical halves of the operational story. This post is the wrapping. The security post is the foundation under all of it — none of this is meaningful if the security model is broken. The multi-tenant checklist intersects: multi-tenant servers are almost always at rung 4 by definition, because external customers will not accept anything less.

For teams whose MCP server has just crossed an internal usage threshold and is starting to feel "real," this post is the prompt to take the work of formalization seriously. The teams that do this before the first incident handle that incident gracefully. The teams that wait, do not.

The MCP server gets treated like a product whether you set it up that way or not. The choice is whether to walk into that treatment with the operational shell to support it or to stumble into it the first time something breaks. The work above is unglamorous and forgettable; it is also the work that distinguishes a system people will keep using from one they will quietly stop trusting after the first scare.