Multi-Tenant MCP: The Honest Checklist

The transition from "an MCP server for me and my team" to "an MCP server with many customers on it" is a bigger architectural step than most teams treat it as. The same code, same SDK, same tools — wrapped in a sign-up form and a Stripe integration. What could go wrong?

A great deal, as it turns out, and the things that go wrong are mostly invisible until they are already wrong. Single-tenant MCP servers benefit from a class of free properties: every call is from "the user", every secret is "the user's", every resource is "the user's data". Multi-tenant servers have to earn every one of those properties through deliberate engineering. Each property a team forgets to earn is a quiet bug that does not show up in testing, does not show up in single-customer demos, and shows up in the worst way the first time a customer reports they can see another customer's data.

This post is the checklist we wish we had handed teams before their first multi-tenant MCP rollout. Each item is a property that is automatic in single-tenant deployments and explicit work in multi-tenant ones. Skipping any of them is a quiet hazard.

The post pairs with the security post and the bounded-context post. Multi-tenancy turns up the volume on every concern from those two — narrow tools matter more, audit logs matter more, server boundaries matter more. The checklist below assumes you have those foundations and is the additional layer specific to multi-tenant operation.

What "tenant" actually means

Before the checklist, a clearer statement of the term. A tenant is a unit of customer-or-account isolation. For a SaaS product, it is usually one customer organization. For a B2B platform, it might be one client company. For a developer-tools product, it might be one workspace or one project.

The defining property: data and operations belonging to one tenant should be invisible and inaccessible to any other tenant, and the system enforces this rather than relying on the agent to behave well.

That second clause is the key. In single-tenant systems, the agent is on the same side as the user — both want the user's tools to operate on the user's data. In multi-tenant systems, the agent's behavior is no longer aligned by default. An agent on tenant A, running for user A, can be tricked or instructed to attempt operations that touch tenant B. The system has to refuse those attempts at a layer below the agent's discretion.

This shifts the threat model. The security post frames MCP security as "prompt injection more than network attack." Multi-tenant adds: cross-tenant data exposure as the worst-case outcome of a successful prompt injection. Every property below is partly about that worst case.

1. Per-tenant authentication and identity

The foundation. Every request that reaches the MCP server must carry a tenant identity, and that identity must be cryptographically attributable to a real customer.

OAuth subject claim. From the OAuth post, the IdP-as-authorization-server pattern gives you a subject claim per user. For multi-tenant, you need a tenant claim too — usually the user's organization, often expressed as a tenant or org claim in the JWT, sometimes derived from the subject's email domain (with care; email domains are not authoritative).

Tenant resolution at the request boundary. The very first thing every handler does is read the verified tenant ID from the request context, before any business logic. Every. Handler. Every. Request. No defaults. No "if no tenant, assume tenant 1." A request without a tenant claim is rejected; a tenant claim that does not match a known tenant is rejected; ambiguity is rejected.

No tenant ID in tool args. This is the trap teams fall into. A tool with a tenantId parameter looks convenient — the agent can pass it, the server can use it. But now the agent is asserting which tenant the call belongs to, and the tenant boundary has moved from auth to user-controlled input. An attacker who tricks the agent into passing a different tenantId has crossed the boundary trivially. The tenant ID lives in the verified token claim only; tool args carry workspace IDs, project IDs, document IDs that are scoped within the tenant — never the tenant itself.

2. Per-tenant tool subsetting

Not every tenant gets every tool. The tool catalog the agent sees should be the intersection of the tenant's plan, the tenant's enabled features, and the user's permissions within the tenant.

At `tools/list` time, filter. When the host calls tools/list, return only the tools available to this caller. A tool the caller cannot use should not appear in the list. The agent then reasons over a smaller, accurate catalog rather than seeing a tool, attempting to use it, and getting denied.

At `tools/call` time, double-check. The list filtering is for the agent's benefit. The call filtering is for security. Even if a tool was listed, every call must verify the caller still has permission for that specific tool. Times change, plans get downgraded, permissions get revoked mid-session.

Group tools by feature flag. Implementation pattern: each tool registration is conditional on a feature flag, checked against the tenant's plan. Tenants without feature.advancedAnalytics simply do not see runCohortAnalysis. The flag check happens in two places — at registration (for the listing) and inside the handler (for the call) — and a missing flag in the handler is an audited security event, not a thrown exception.

The cost: the catalog the agent sees varies per session per user. Caching at the host gets harder. The trade-off is right; the alternative is a model that thinks it has tools it cannot actually use, which leads to confusing apologies and wasted turns.

3. Per-tenant resource scoping

The resources surface from the resources-and-prompts post becomes load-bearing in multi-tenant. A resource URI like kb://documents/12345 is dangerous if the document IDs are not tenant-scoped — an attacker could probe document IDs, find ones belonging to other tenants, and ask for them.

Tenant in the URI scheme, or in the URI prefix. kb://tenants/{tenantId}/documents/{docId} is the safer shape. The tenant ID in the URI is a signal to the resource resolver that this is a scoped lookup, and the resolver verifies that the requested tenant in the URI matches the verified tenant in the request context. If they do not match, the request is denied without revealing whether the resource exists.

Document IDs are tenant-namespaced, not globally unique. Tenant A's document #5 and tenant B's document #5 are different documents and the underlying storage should distinguish them either by composite key or by physical separation. A flat namespace where IDs collide across tenants is one bug away from a cross-tenant leak.

Resource lists never enumerate other tenants' resources. When the host asks for a list of resources, only the requesting tenant's resources appear. There is no debug mode that returns a global view; if you have one, an admin endpoint behind separate auth is the right place for it, not a resource the standard tool surface can reach.

4. Per-tenant audit and observability

The observability post talked about per-call structured logs. Multi-tenant elevates this from "useful" to "required for survival."

Tenant ID in every log line. Not optional. A log without a tenant ID is a log you cannot triage when a customer asks "what did our agent do yesterday?". The standard log schema for multi-tenant MCP includes tenant_id, caller_subject, tool_name, result_kind, the trace fields, the timestamp.

Per-tenant audit trails accessible to the tenant. The customer should be able to see every tool call their agents made, with arguments (redacted appropriately) and results. This is not a "nice to have" for compliance-conscious customers — it is the table stakes for selling to anyone in a regulated industry. Build the audit-export endpoint early.

Per-tenant rate limits on observability. A noisy tenant should not be able to fill the shared log volume to the point where other tenants' logs get sampled out. Either separate log streams per tenant (best, expensive) or separate rate limits per tenant per log stream (good enough, cheap).

5. Cost allocation per tenant

The boring side of multi-tenant that almost every project gets wrong on the first try.

Every external call is attributable. If your tools call a third-party API that charges per request — Stripe, OpenAI, a search provider — the cost has to be allocated to the originating tenant. Otherwise heavy users subsidize light users, your unit economics drift, and the conversation with the CFO becomes uncomfortable.

The instrumentation lives in the handler, not the integration. Your handler knows which tenant initiated the call. Your Stripe SDK does not. The tagging — "this Stripe API call cost $0.02 and is attributed to tenant X" — happens at the handler level, where the tenant context is in scope.

Synchronous vs. asynchronous billing. For low-volume operations, log per-call and aggregate nightly. For high-volume operations (a tool that fetches a hundred chunks from a vector DB), per-call logging is too expensive; bucket the cost by tenant in-process and flush on a schedule. The choice depends on the tool; default to per-call logging until volume forces the change.

Costs the tenant did not directly trigger. Some costs (background re-indexing, scheduled freshness checks) are not initiated by a specific tool call. They still need to be allocated. The tenant whose data is being re-indexed is the one to charge; the allocation is at the data-ingestion event, not at the re-indexing event.

6. Prompt-injection blast radius

The risk that does not exist on a single-tenant server and that defines the worst-case shape of a multi-tenant one.

The model the agent is running is shared across tenants. A clever payload in tenant A's data might cause the agent, when it next runs for tenant B (after a session boundary, after a user logout, after the host reuses a process), to behave on B's behalf in a way A directed. The vulnerability surface here is enormous and underexamined in the literature.

Session isolation is non-negotiable. Every host session is a fresh context. No cross-session memory in the model. No agent state that survives a tenant boundary. This is partly the host's job — well-behaved hosts already do this — and partly the server's job — refuse to accept session-spanning state in tool args, refuse to expose mechanisms that would let one session's behavior persist into another's.

Prompt-injection detection from the [observability post](/blog/observability-for-mcp-servers), with tenant scoping. A tool argument containing injection markers is more dangerous on a multi-tenant server because the consequence is potentially cross-tenant. Detection and alerting should treat injection patterns as elevated severity in multi-tenant contexts.

Limit cross-tenant model attention. If your server is wrapping a backend that holds data from multiple tenants, never let a single tool call span tenants. A searchDocuments tool runs against the calling tenant's documents only, ever. There is no searchAllTenantsDocuments debug tool sitting in the same surface; if it exists, it is on a separate, admin-only server with its own auth.

7. Per-tenant configuration

Customers want the agent to behave their way. Some want more aggressive tools enabled; others want a stricter subset. Some want different tool descriptions ("we call it 'projects' not 'workspaces'"). Some want different defaults.

Tool descriptions can be customized per tenant. Within reason — not the parameter shape, which is part of the protocol contract — but the prose of the description can vary. The handler reads the tenant context, looks up the tenant's preferences, and assembles the description text on registration. This is also where ubiquitous-language differences can be honored — one tenant calls them "tickets," another calls them "issues."

Per-tenant feature toggles for unsafe operations. Destructive tools like cancelOrder may be off-by-default for new tenants and on-by-default for tenants who have explicitly enabled them. The tenant's admin makes the call, not the agent and not the user.

Defaults that vary safely. Default values for optional parameters can be tenant-specific. searchDocuments({ limit }) might default to 10 for tenant A and 30 for tenant B, based on their plan. The agent does not need to know about this; the handler reads the default from tenant config.

The discipline: configurability is a feature for customers and a tax on the engineering team. Pick what to make configurable based on what customers actually ask for, not what is theoretically configurable.

8. Tenant lifecycle

The boring mechanics that decide whether a tenant cleanup actually cleans up.

Tenant creation. A new tenant gets a tenant ID, a default config, and an audit-log scope. No agent operations are allowed until tenant creation has succeeded. Race conditions where a request arrives during tenant provisioning are handled by rejecting cleanly, not by half-creating.

Tenant suspension. When a customer is suspended (non-payment, ToS violation, manual hold), every subsequent request from that tenant is rejected at the auth boundary. Not at the handler. The auth-time check is what guarantees no work is done; relying on handler-time checks is a defense-in-depth bonus, not the primary control.

Tenant deletion. The hardest one. Customer leaves; data must be deleted; audit logs must be retained per regulatory requirements; backups must be purged eventually. The deletion flow is its own engineering project — soft-delete first, hard-delete after a retention window, with verification that no residual data remains. Most teams underestimate this by an order of magnitude. Build it before you have your first customer who wants to leave.

Data export. Symmetric with deletion. Customers want to leave with their data. The export endpoint produces a structured archive of everything the tenant owns — documents, configs, logs (within retention rules). Build this early; retrofitting it is much harder than building it.

9. The blast-radius drill

A pattern we have started running on multi-tenant MCP projects, before any customer onboarding: the blast-radius drill.

Pick a tenant. Sketch every operation a malicious or compromised agent on that tenant could attempt. For each operation, ask:

What does the auth check refuse?
What does the tool subsetting refuse?
What does the resource scoping refuse?
What does the rate limit refuse?
What gets logged?
What gets alerted?

The exercise produces a matrix. Cells where every layer refuses are cells where you are safe. Cells where multiple layers refuse are defense-in-depth. Cells where no layer refuses are hazards. Address each hazard before launch.

This is unglamorous work. It is also the work that decides whether the first cross-tenant incident is a near-miss or an exfiltration. The teams that ship multi-tenant MCP without this drill are the ones who do it after the incident.

What this is not

For completeness, two patterns that look like multi-tenancy and are not.

"Multiple users on a single-tenant server." A team of ten people sharing access to a server that wraps their company's tools is not multi-tenant — it is single-tenant with multiple authenticated users. Most of the rules above are softer there because the trust boundary is the company, not the user. The discipline still helps; the consequences of a slip are smaller.

"Multi-instance, single-tenant per instance." Spinning up a separate server per customer, each one on its own infrastructure with its own credentials, is also not multi-tenant in the operational sense. It avoids most of the cross-tenant risks but creates a different operational cost (N times the deploys, N times the upgrades). For high-stakes customers (financial, healthcare, government) this is sometimes the right call. For most SaaS plays, the operational cost is too high.

True multi-tenancy is the model where one running server handles many tenants, and that is the model the checklist above is for.

Where this fits

The security post is the prerequisite — multi-tenant doubles every concern there. The bounded-context post interacts: a tenant might have access to some bounded contexts and not others, which is where tool subsetting starts. The observability post is the operational arm; per-tenant logging and tracing are non-negotiable.

For the auth substrate, the OAuth 2.1 post covers the IdP-as-authorization-server pattern that makes multi-tenant identity tractable. For deploy shape, the transport post decides whether multi-tenancy is even on the table — multi-tenant stdio is a category error; multi-tenant Streamable HTTP is the only viable shape.

Multi-tenant MCP is the most ambitious shape an MCP server can take. The reward is real — a single deployment serving many customers is the unit economics of every successful SaaS — but the engineering is total. Every property that is automatic in single-tenant has to be earned. The teams that ship well are the ones who walk this checklist before their first customer signs the contract, not after.