Understanding LLM Context Windows and MCP Primitives

Learn how LLM context windows behave, why token budgets matter for MCP server design, and the three primitives that shape how tools, resources, and prompts are offered to a model.

A large language model's context window is the working memory behind every conversation, and understanding how it fills up is the first step toward designing a Model Context Protocol server that is worth using.

  • Context windows are finite, shared across every tool you expose, and paid for in tokens on both the input and output side.
  • Research shows that a model's first response to a prompt is usually the most accurate, so long threads and bloated tool lists raise the risk of hallucinations.
  • MCP defines three primitives, tools, resources, and prompts, and most real-world server design effort lives inside tools.

This lesson is a preview from our Building Your First MCP Server and Client Course Online. Enroll in a course for detailed lessons, live instructor support, and project-based training.

Most modern models advertise context windows in the range of 128,000 to 1,000,000 tokens. That number sounds generous until you start counting everything that actually lives there: the system prompt, the user's question, any reasoning tokens the model produces while it plans, the JSON payloads returned by tool calls, and every follow-up exchange that comes after. Every extra token nudges the model closer to the edge of its memory, where accuracy drops and compacting warnings start to appear.

What Lives Inside the Context Window

A conversation begins with a system prompt and a user message. From there, the model may trigger reasoning tokens, plan a series of tool calls, absorb the returned data, reason some more, and finally produce an answer. The user then asks a follow-up and the cycle repeats. None of that comes free, and the full list of available tools is re-exposed on every turn, which means tool descriptions alone can quietly consume a meaningful portion of the budget.

Two consequences follow. First, long chat sessions are statistically more likely to hallucinate or produce gibberish than short ones, because attention degrades as useful content gets buried deeper in the window. Second, every token saved on the server side compounds, because input and output tokens are billed on every call.

Why Context Budgets Shape Server Design

A well-designed MCP server treats the context window like a tight budget. That means trimming tool descriptions, stripping optional fields out of responses, and avoiding the temptation to expose capability for its own sake. The more tools you hand to a model, the more of its working memory disappears before the user has even said anything, and the usable conversation space shrinks with it.

Targeted tool calls, purposeful descriptions, and lean response shapes are what separate a useful server from one that degrades the model it is supposed to help.

The Three MCP Primitives

MCP exposes three building blocks that a server can offer to a client:

  • Tools are functions the server executes on demand to provide highly relevant data back to the model. They account for roughly ninety percent of the design decisions on a typical server.
  • Resources are persistent content the model can reference, useful for more advanced workflows once the fundamentals are solid.
  • Prompts are reusable instruction patterns, also more advanced and usually introduced after the tool layer is working well.

For a first server, tools are the right place to focus. They give immediate value, they are easy to iterate on, and they expose most of the tradeoffs around context management. Resources and prompts become more interesting once the tool surface is mature enough to hand off real work.

MCP starts to feel like magic once the underlying idea clicks, but that clarity comes from respecting the constraints. Treat the context window as shared and scarce, design tools that are specific and lightweight, and layer in resources and prompts only after the basics are trustworthy. Get that balance right and the server becomes a clean interface between a model and the data it needs, rather than a tax on its attention.

How to Learn AI

Build practical, career-focused skills in AI through hands-on training designed for beginners and professionals alike. Learn fundamental tools and workflows that prepare you for real-world projects or industry certification.