MCP Server

Why It Exists

Anyone who built AI tools before MCP existed knows the pain. Every time a coding assistant needed to read files, query a database, or hit an API, that meant custom integration code. Then again for a different model provider. And again. The result was an N-by-M problem: N tools times M AI platforms, each with its own bespoke glue.

Nobody enjoyed maintaining that.

Anthropic introduced Model Context Protocol in late 2024 to kill this pattern. The idea is straightforward: define a universal protocol for how AI models talk to external tools and data sources. Write an MCP server once, and any MCP-compatible client can use it. Claude Desktop, VS Code extensions, custom apps, whatever. Think of it like USB for AI tool connections, or LSP (Language Server Protocol) but for model-to-tool communication instead of IDE-to-language-engine communication.

How It Works

MCP is a client-server architecture with three layers. Nothing exotic here.

Host Application. This is the user-facing app, like Claude Desktop, an IDE, or a custom AI product. It contains one or more MCP clients and handles the user session, model interaction, and orchestration.

MCP Client. A protocol client inside the host that holds a stateful 1:1 connection to a single MCP server. It handles capability negotiation, request routing, and response handling for that one server.

MCP Server. A lightweight service that exposes capabilities through the three MCP primitives. Servers can talk to local resources (files, databases) or remote services (APIs, cloud infra).

The three primitives have different control semantics, and this distinction matters more than people realize. Tools are model-controlled. The LLM decides when to call them based on the conversation. Resources are application-controlled. The host decides when to inject them as context. Prompts are user-controlled. They fire on explicit user actions, like slash commands. Getting this separation right is what makes MCP more than just "function calling with extra steps."

Architecture Deep Dive

Transport Layer. MCP provides two options. stdio transport spawns the server as a child process, communicating over stdin/stdout with JSON-RPC 2.0. This is the simplest path. No network config, no auth, sub-millisecond latency. The obvious downside: server and host must live on the same machine. Streamable HTTP transport exposes the server as an HTTP endpoint, using Server-Sent Events for server-to-client streaming. This enables remote deployment, load balancing, and multi-tenant setups, but real authentication becomes necessary.

In practice, most developers start with stdio for local development and move to HTTP when they need to share servers across teams or deploy to the cloud.

Initialization. When a client connects, both sides run a capability negotiation handshake. The client sends an initialize request with its protocol version and capabilities. The server responds with its own. This lets both sides adapt gracefully. If the server supports prompts but the client does not, that is fine. Nothing breaks.

Tool Execution Flow. When an LLM decides to use a tool, the host routes the request through the right MCP client. The client fires a tools/call JSON-RPC request to the server with the tool name and arguments. The server runs the operation and returns structured results (text, images, or embedded resources). The host can apply safety checks or ask for user confirmation before passing results back to the model. This confirmation step is important. An LLM silently deleting database rows is not acceptable.

Security Model. Remote MCP servers authenticate with OAuth 2.1 and PKCE (Proof Key for Code Exchange). The 2025 spec update added authorization server metadata discovery (RFC 8414) and dynamic client registration (RFC 7591), so servers can advertise their auth requirements to clients automatically. Fine-grained permission scoping is also possible. A database MCP server might offer read-only and read-write tool variants behind different OAuth scopes.

The ecosystem is moving fast. Official SDKs cover TypeScript and Python. Community SDKs exist for Go, Rust, Java, and C#. Claude, Cursor, Windsurf, and Cody have all adopted MCP, and the server registry has hundreds of community-built servers for databases, APIs, cloud providers, and developer tools. The real test will be whether the protocol stays stable as adoption grows. So far, so good.

Production Considerations

Run MCP servers as containerized services behind an API gateway. This is not optional for production. Request rate limiting is essential, audit logging on every tool invocation, and circuit breakers for any external service the server depends on. When something goes wrong at 3 AM, structured logging with correlation IDs that trace from user query through the MCP client, to the server, out to the external service, and back will be invaluable.

Monitor the right things: tool call latency at p50, p95, and p99. Error rates broken down by tool. Authentication failures. These metrics show where the system is actually hurting, not where assumptions say it is.

Version server APIs from day one. Tool schemas will change, and clients need to handle both old and new versions during rollouts. Add health check endpoints for the orchestration platform. For high availability, run multiple server instances behind a load balancer. MCP's request-response model within a session is stateless enough that horizontal scaling works without much ceremony.

One more thing that the docs do not emphasize enough: keep servers small and focused. A server that exposes 5 well-defined tools is dramatically easier to operate, debug, and evolve than one that exposes 40. When unrelated tools start getting crammed into a single server, split it up. The future payoff is worth it.

Why It Exists

Nobody enjoyed maintaining that.

How It Works

MCP is a client-server architecture with three layers. Nothing exotic here.

MCP Server. A lightweight service that exposes capabilities through the three MCP primitives. Servers can talk to local resources (files, databases) or remote services (APIs, cloud infra).

Architecture Deep Dive

In practice, most developers start with stdio for local development and move to HTTP when they need to share servers across teams or deploy to the cloud.

Use Cases

Architecture

Why It Exists

How It Works

Architecture Deep Dive

Production Considerations

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies

MCP Server

Use Cases

Architecture

Why It Exists

How It Works

Architecture Deep Dive

Production Considerations

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies