AI is gaining traction fast, and it keeps getting faster and faster. All the progress in the world sets the stage for a lot of new tools, techniques and paradigms. Let's explore how an MCP integrates into this environment and how it relates to agents and LLMs.
This is the third post in our MCP Overview series. The first post established what MCP is at the protocol level — a standard way for an application to expose capabilities to a language model. This post takes one step back and asks the conceptual question: in a working AI tooling stack, what's actually doing what? Where does the reasoning happen? Where does the work happen? And why does the difference matter to anyone building or buying these systems?
If you've spent any time around LLMs, MCP servers, and "AI agents," you've probably noticed something uncomfortable: the words don't line up. Different teams use the same word to mean different things, and the same thing gets called by three different names depending on who's writing the article. It's hard to reason precisely about a system whose vocabulary is sliding underneath you.
So we'll start there.
---
The terminology problem nobody is fixing
The word agent is being asked to do too much. Right now it can mean any of:
- A host application that wraps an LLM and connects it to tools — Claude Desktop, the Claude mobile app, Cursor, Windsurf, Zed's AI mode, or any custom application a developer builds to talk to an LLM. This is the meaning that matters in an MCP context.
- An autonomous task runner inside such an application — for example, when Claude Code spins up a sub-agent to investigate something and report back. This is closer to the original computer-science meaning: a process pursuing a goal with some autonomy.
- Any application that has an LLM somewhere in its stack. This usage is the loosest and the least useful — "agent" here just means "thing with AI in it."
All three are floating around in the same conversations, often in the same paragraphs. So when someone says
I'm building an agent
... then it's worth asking which one they mean before nodding.
In this article — and throughout this series — we use agent in the first sense: the host application that holds the LLM and orchestrates calls to it. Wherever the second sense matters, we'll say autonomous task or sub-agent explicitly. We avoid the third sense entirely; it's too vague to support a useful sentence.
With that pinned down, let's lay out the three pieces.
---
The three layers, precisely
A working AI tooling stack has at least three components. They sit at very different levels of the system, and they have very different jobs. Understanding which is which is the difference between building something coherent and building something that almost-but-not-quite works.
The LLM
The language model is, at its core, a very sophisticated text-completion engine. You feed it a sequence of tokens; it produces the next sequence. That's the entire computation. It has no tools, no memory of yesterday, no ability to take action. It cannot decide on its own to query a database, send an email, or run a build. It can only produce text — and the text it produces is shaped by the prompt it receives.
This is true whether the model in question is Claude Opus, GPT-5, Gemini, Llama, or anything else. The architecture varies; the contract with the outside world does not. Text in, text out. Stateless across calls.
What gives an LLM its apparent intelligence is the quality of that text completion. A capable model, given a well-formed prompt, can produce a continuation that reasons through a problem, drafts a response, or — critically for our purposes — emits a structured request to call a tool. But it doesn't call the tool. It emits the request. Something else has to read that request and act on it.
That something else is the agent.
The agent (host application)
The agent is the program that wraps the LLM and gives it a way to interact with the world. It is, very concretely, a control loop:
- Take the user's input.
- Send it — along with available tools, prior conversation, system instructions — to the LLM.
- Read what the LLM produced. If it's plain text, show it to the user. If it's a tool-call request, execute the tool.
- Take the tool's result and feed it back to the LLM as the next turn.
- Repeat until the LLM produces a final answer, or the user interrupts.
That's the whole shape of an agent. Claude Desktop is an agent. The Claude mobile app is an agent. Cursor's compose mode is an agent. ChatGPT with tools enabled is an agent. Any custom application(can be as well written in Flutter) someone builds that wraps the Anthropic SDK or the OpenAI SDK and gives the model tools to call — also called agent.
The agent is where the behavior of the system is decided. It's where decisions get made about which model to use, how much context to send, what tools to expose, how to handle a tool failure, whether to ask the user for confirmation before doing something destructive, when to stop the loop. None of that is in the model. None of it is in the tool. It's in the agent. And its rules aren't magic - they are written in if/else, fs.readFile, queries to databases, memory management, strong architecture & high-quality design and logic or other operations specific to any app.
This is also where the term "agentic" tends to land. When people say a system feels "agentic," they usually mean: the agent loop is patient, plans ahead, recovers from errors, can dispatch sub-tasks, and pursues a goal across many steps without constant supervision. None of that is a property of the LLM itself. It's a property of how the agent is engineered.
The MCP server
The MCP server is a tool provider. It exposes a set of capabilities — listDrafts, searchOrders, runTests, whatever the domain calls for — over a standardized protocol. When an agent decides to call one of those capabilities, the MCP server executes the work and returns a result.
The MCP server has no LLM inside it. It does not reason. It does not loop. It does not maintain memory across calls (except in the boring sense that it might talk to a database, the same way any service does). Each tool call is an isolated request: arguments come in, work happens, a result goes out. Conceptually, an MCP server is closer to a small, well-bounded backend service than to anything "intelligent" in the AI sense.
That phrasing is important, and it's where most introductory articles go astray. An MCP server is not unimportant because it doesn't reason at runtime. The reasoning that matters happens upstream — at design time, when someone decides what tools to expose, what shape they should have, how they should validate inputs, what error messages they should return so an LLM can recover gracefully, what guarantees they make about safety and idempotence. That work is hard. Doing it well is the difference between an MCP server that an LLM can actually use and one that produces failed tool calls and confused agent loops.
We'll come back to that. First, the picture.
---
The picture, with everything in its place
USER
│
▼
┌──────────────────────────────────────────────────────────┐
│ │
│ AGENT (host application) │
│ │
│ • Owns the conversation │
│ • Sends prompts to the LLM │
│ • Reads tool-call requests from the LLM │
│ • Executes them by calling MCP servers │
│ • Decides when to stop │
│ │
│ Examples: Claude Desktop, Claude mobile, Cursor, │
│ Windsurf, custom in-house host applications │
│ │
└────────────┬─────────────────────────────────┬───────────┘
│ │
prompt │ │ tool call
(text) │ │ (JSON-RPC)
▼ ▼
┌───────────────┐ ┌───────────────────┐
│ LLM │ │ MCP SERVER │
│ │ │ │
│ Text in, │ │ Exposes a set of │
│ text out. │ │ capabilities. │
│ │ │ │
│ No tools. │ │ No LLM. │
│ No memory │ │ No conversation │
│ across calls. │ │ state. │
│ No loop. │ │ No autonomy. │
│ │ │ │
│ Returns plain │ │ Returns the │
│ text or │ │ result of the │
│ structured │ │ requested │
│ tool requests │ │ operation. │
└───────────────┘ └───────────────────┘The picture answers a question that confuses almost everyone the first time they meet MCP: where is the AI? The answer is, in this stack, only one place: the LLM. Everything else is plain software. The agent is a control loop. The MCP server is a service. There is no AI inside an MCP server. There is no AI inside the host application either, except inasmuch as the host application calls out to the LLM. The intelligence in the system is concentrated in one component, and the rest is the scaffolding that lets that one component do useful work in the real world.
Once you see the stack this way, every confusing thing about MCP starts to make sense.
---
Why this misunderstanding happens
Why do people — including, often, experienced developers — assume the MCP server is the "smart" piece? A few reasons converge:
Marketing drift. "MCP server" is a new-sounding term, and "agent" is the word currently associated with intelligence in the AI conversation. Anything new and AI-adjacent gets implicitly classified as "another flavor of agent."
The protocol's flexibility hides its scope. MCP defines tools, resources, and prompts. The protocol is rich enough that you could build an MCP server that does sophisticated work — including, in principle, calling out to its own LLM. That doesn't make it part of the MCP server's job. It makes it possible to bury an agent inside what looks like a tool from the outside, which is sometimes useful but mostly a mental-model trap.
Tutorials sell capability, not architecture. Most online MCP tutorials are "here is your first server." They demonstrate that you can call a function from Claude. They rarely show what not to put in the server, or where the design effort actually goes.
The result is that people start projects with the wrong assumption — that they're going to put the cleverness into the MCP server — and the assumption survives until something breaks. Tools that try to do too much become unreliable. Tools that bury logic the LLM should be doing make the agent loop dumber, not smarter. Tools that maintain hidden state make debugging miserable.
The fix is to put each kind of work where it belongs.
---
Where the engineering work actually lives
If the MCP server doesn't reason at runtime, what is the work of designing one? This is where the original framing — "MCP is the dumb part" — does the most damage, because it implies the work is trivial. It is not. Done well, designing an MCP server is one of the most architecturally demanding pieces of an AI tooling project. Here is what that work involves:
Tool surface design. What should the agent be able to do? What set of operations covers the use cases without giving the agent the ability to do things that should require human approval? This is exactly the kind of question that bounded-context thinking from Domain-Driven Design helps answer. A good MCP server exposes one bounded responsibility — one cohesive set of operations — rather than a kitchen-sink set of tools that span half the company.
Tool granularity. A tool that's too fine-grained forces the LLM into long sequences of brittle calls. A tool that's too coarse swallows decisions the agent should be making. Finding the right level — publishDraft(filename) rather than executeSql(query), but searchOrders(filters) rather than separate tools for every possible filter combination — is judgment work, not boilerplate.
Schema design. An MCP tool's input schema is a contract, but it's also a teaching aid. The LLM reads the schema and the description to decide whether and how to call the tool. Schemas that are vague produce wrong calls. Schemas that are over-restrictive produce failed calls. Schemas that are well-named, well-described, and well-bounded produce reliable behavior.
Tool descriptions. This is the single most underrated part of MCP work. A tool's description is the prompt the LLM uses to decide whether to invoke it and how. Writing descriptions that an LLM can reason about — clear preconditions, explicit consequences, examples of right and wrong usage — is its own discipline. We'll devote an entire post to it later in this series.
Failure modes. When a tool fails, what does the error look like? Can the LLM recover? Does it know to ask the user for clarification, or does it hallucinate a workaround? Error messages are part of the API surface in a way they aren't in conventional backend design.
Security boundaries. The MCP server is where the principle of least privilege gets enforced. Path traversal protection, scope checks, audit logging, allowlists for destructive operations — these are tool-handler concerns, not infrastructure concerns. Get them right and the agent has a safe range of motion. Get them wrong and you've exposed the host's filesystem or production database to whatever the LLM happens to produce.
Observability and operations. An MCP server in production needs the same care as any other backend service — structured logs, metrics on tool latency and failure rate, alerting when something starts behaving oddly, version pinning, deployment hygiene. The fact that the consumer happens to be an LLM doesn't reduce the operational burden; in some ways it increases it, because the consumer is non-deterministic by design.
None of this is trivial. The MCP server is the place where careful engineering compounds — every tool you design well, the LLM uses well; every tool you design poorly, the LLM fights with. The best MCP servers are the result of people who understand both the domain and the way LLMs reason, working hard to make those two things meet cleanly.
That's the engineering work. It happens upstream of every tool call.
---
What MCP actually solved
If MCP servers are well-bounded backend services, you might reasonably ask: what's new? Why is this a thing?
The answer is interoperability.
Before MCP, every host application invented its own format for tools. A tool you wrote for Cursor wouldn't work in Claude Desktop. A tool you wrote against the OpenAI Assistants API wouldn't work against Anthropic's Messages API. Tools were proprietary, hosts were silos, and the same idea got reimplemented seven times for seven different host applications. Most teams gave up and only built tools for the one host they used.
After MCP, a tool you write once works in every host that speaks the protocol. Claude Desktop, Cursor, Windsurf, Zed, Continue, and any custom host built against the MCP SDK can use the same MCP server. The capabilities are decoupled from the host. The ecosystem can compound.
The closest precedent we have for this is the Language Server Protocol. Before LSP, every code editor had to write its own integration with every programming language. After LSP, the language server became the thing — write it once, and every editor that speaks LSP gets autocomplete, jump-to-definition, and diagnostics for free. This is not a loose analogy; it's structurally the same move. MCP is doing for "things an LLM agent might want to call" what LSP did for "things an editor might want to ask about code."
That's what MCP actually solved. Not "smarter agents." Not "more capable LLMs." A standard plug shape, and an ecosystem that can grow because of it.
---
So what does "agentic" mean now
With the three layers cleanly separated, the word agentic finally has a clear referent. It describes the agent loop's behavior — how patient it is, how many steps ahead it plans, whether it can dispatch sub-tasks, how it handles errors, whether it remembers things across long horizons.
This is exclusively a property of the host application and the way it engineers the loop around the LLM. It has nothing to do with the LLM in isolation, and nothing to do with the MCP servers being called.
That distinction matters when someone tells you they're "building an agentic system." It's worth asking: are you building a host application that handles long-running, error-tolerant, autonomous task execution? Or are you building MCP servers that expose capabilities to an existing host? The two projects have very little overlap. They require different skills, different infrastructure, and different operational patterns. People conflate them constantly.
An MCP server doesn't become more agentic when you add features to it. The agent loop is somewhere else.
---
Why this picture is useful when you're building
A clean mental model is not just academic. It changes what you build. Three concrete examples of decisions that get easier once the layers are separated:
"Should this logic go in the tool or the agent?" If the decision depends on context that the LLM has and the tool doesn't — recent conversation, user intent, the result of a previous call — it belongs in the agent loop. If it's pure mechanics that any caller would want done the same way, it belongs in the tool. Most teams default to "put it in the tool because the tool is closer to the work," and that's the wrong instinct. The tool should be a clean operation; the agent should be the orchestrator that decides which operations to invoke and in what sequence.
"Should this MCP server have memory?" Almost always: no. State that should persist across calls belongs in the underlying system the tool is acting on — the database, the file system, the API the tool is wrapping. The MCP server is a stateless translator between the agent and that system. Adding hidden state inside the server makes debugging miserable and breaks the assumption that any tool call can be reasoned about in isolation.
"Should I build one big MCP server or several small ones?" Several small ones, scoped to bounded contexts. One MCP server for blog publishing, one for Flutter pipeline operations, one for CRM access. This mirrors the modular-monolith principle from backend architecture: the server is small enough to understand, audit, and version independently. Scoping by concern also limits the blast radius of a security issue.
The pattern repeats: the simpler and narrower each MCP server is, the more leverage the agent gets out of it. The intelligence is upstream — in the tool design, in the descriptions, in the choice of what to expose. That's where the leverage compounds.
---
What this means commercially
If you're choosing between building MCP servers in-house or having someone build them for you, the question to ask is not "can this person operate an LLM?" but "can this person design a tool surface for a domain we care about, and operate it as a real backend service?" Those are very different questions, and the second one is the one that determines whether the agent works in production.
This is the framing we use at Amazing Resources when scoping MCP work for clients. We treat each MCP server as a small, well-bounded backend service — designed against the client's actual domain, with clear tool semantics, real schemas, real error handling, real audit logs, and real operational care. We borrow heavily from the same architectural toolkit that produces good APIs in the first place: bounded contexts from DDD, clean separation between domain and infrastructure, and an obsession with making the boundary contract precise. Those habits transfer directly. A well-designed MCP server looks, reads, and operates a lot like a well-designed REST service that has been written with one specific consumer in mind — a language model, with all the strengths and oddities that consumer brings.
We mention this because the most common failure mode in MCP work is treating the project as "an AI thing" rather than as a backend project that happens to have an LLM at the other end of the wire. The first framing produces something that demos well and breaks in production. The second produces something that survives.
---
Where to go from here
If you want to keep pulling on this thread, a few directions are worth your time:
- Read the [first post in this series](/blog/what-mcp-actually-is) if you haven't yet. It establishes the protocol-level picture — what an MCP server actually is on the wire, and why "the LLM has no concept of MCP" is a much more useful sentence than it first sounds.
- Build a tiny MCP server. The mental model in this post lands much harder once you've actually wired one up, even a trivial one that just lists files in a directory. The MCP SDKs for Node and Python make this a one-evening exercise. The next post in this series walks through it from a Node.js perspective.
- Read about the Language Server Protocol. The LSP precedent is the cleanest existing example of the protocol-as-leverage pattern that MCP is repeating. Understanding why LSP succeeded — and where it strained — is the best preparation for thinking about where MCP will go.
- Look at agent design separately. If you're interested in the host-application side of the stack — building a custom agent rather than building tools for someone else's — that's a substantial topic in its own right, and one we cover in post 7 of this series, where we build a Flutter-based host that connects an on-device LLM to MCP servers.
The vocabulary will keep drifting. New product launches will keep blurring "agent" until it means whatever the marketing team needs it to mean this quarter. But the underlying picture is stable: there is a model that produces text, a host application that loops around it, and a set of tools the host can invoke. Three layers. Each with a different job. Each demanding a different kind of engineering care.
Once that's clear, the rest of MCP — the SDK, the security work, the transport choices, the tool description craft — becomes a series of well-posed questions instead of a fog.
The fog is the hard part. The questions, once they're well-posed, are answerable.
---
Next in this series: [building an MCP server for Node.js developers](/blog/mcp-servers-nodejs) — the canonical pillar post for the rest of the cluster. Or, if you'd rather see the protocol from the bottom up first, [building MCP from scratch without the SDK](/blog/mcp-from-scratch-no-sdk) walks through the JSON-RPC layer directly.
Building an MCP server for your CRM, internal tooling, or product? We do this professionally at AmgRes →*