FAQ13 min read

Context Management - FAQ

SFEIR Institute

TL;DR

Context management in Claude Code determines the quality of every generated response. Mastering the 200,000-token window, automatic compaction, and Plan mode lets you reduce your costs by 40 to 60% while getting more precise results. Here are answers to the most frequently asked questions to optimize your context usage.

Context management in Claude Code determines the quality of every generated response. Mastering the 200,000-token window, automatic compaction, and Plan mode lets you reduce your costs by 40 to 60% while getting more precise results. Here are answers to the most frequently asked questions to optimize your context usage.

Context management in Claude Code is the set of techniques for controlling what information the agent receives in its token window to produce relevant responses. Claude Code uses a 200,000-token window - roughly 150,000 words - making it one of the largest contexts available for an agentic coding tool.

this capacity has doubled compared to the first versions of the Claude 3 model.

How does the 200,000-token context window work in Claude Code?

The context window is the working memory that Claude Code uses for each conversation. It contains all messages, files read, and tool results accumulated during your session.

Visualize this window as a fixed budget of 200,000 tokens. Each action consumes part of this budget: a 500-line file represents approximately 4,000 tokens, a long response between 1,000 and 3,000 tokens.

In practice, 80% of context is consumed by file reads and tool results, and only 20% by your messages and the agent's responses.

ElementAverage consumptionShare of context
Source file (500 lines)4,000 tokens2%
Bash command result1,500 tokens0.75%
Average user message200 tokens0.1%
Agent response1,500 tokens0.75%
CLAUDE.md file loaded800 tokens0.4%

To understand the basics of interacting with Claude Code, consult the FAQ on your first conversations that covers the fundamentals.

Key takeaway: the 200,000-token window is a shared budget between your messages, files read, and tool results - monitor your consumption.

How do I know how many tokens are left in my session?

Run the /cost command in Claude Code to display your consumption in real time. This command shows the number of tokens used and the estimated session cost.

# Display token consumption and cost
$ claude
> /cost

The context indicator also appears in the Claude Code status bar. When it exceeds 80%, a warning alerts you that automatic compaction is approaching.

In practice, a typical development session consumes between 50,000 and 120,000 tokens before triggering compaction. The essential slash commands include /cost and /compact among the essential diagnostic tools.

Key takeaway: use /cost regularly to anticipate compaction and maintain control over your token budget.

What strategies help optimize context daily?

Adopt three key practices: limit files read, break down your tasks, and use targeted instructions in CLAUDE.md. These three levers reduce token consumption by 40% on average.

Specify which files to read rather than letting the agent explore the entire project. A prompt like "Read only src/auth/login.ts and fix the bug on line 42" consumes 10 times less context than a vague prompt.

# Bad practice: broad exploration
$ claude "Find and fix the bugs in the project"

# Good practice: precise targeting
$ claude "Read src/auth/login.ts and fix the TypeError on line 42"
StrategyToken savingsDifficulty
Target files explicitly30-50%Easy
Break into sub-tasks20-40%Medium
Use Plan mode40-60%Easy
Configure CLAUDE.md10-20%Easy
Use /compact manually50-70%Easy

The complete context management guide details each strategy with concrete examples adapted to different project types.

Key takeaway: targeting files, breaking down tasks, and configuring CLAUDE.md are the three pillars of effective context management.

How does Plan mode save tokens?

Plan mode consumes only input tokens, without generating costly tool calls. Activate it with the Shift+Tab shortcut or the /plan command so that Claude Code analyzes without acting.

In Plan mode, the agent reads your codebase, proposes a strategy, and waits for your validation before executing anything. this mode reduces total token consumption by 40 to 60% on complex tasks.

# Activate Plan mode
$ claude
> Shift+Tab  # Switch to Plan mode
> "Refactor the authentication module to use JWT"
# Claude Code proposes a plan without executing actions

In practice, Plan mode is particularly useful for refactoring tasks touching more than 5 files. It prevents you from consuming tokens on unnecessary explorations. Consult the dedicated context management tutorial to learn how to combine Plan mode with other techniques.

Key takeaway: Plan mode saves you 40 to 60% of tokens by separating the thinking phase from the execution phase.

How does automatic compaction work in Claude Code?

Automatic compaction triggers when context reaches approximately 95% of the 200,000-token window. Claude Code then summarizes previous exchanges to free up space while preserving essential information.

Understand that compaction is not a memory loss: the agent retains a structured summary of decisions made, files modified, and errors encountered. However, fine details like exact line numbers or intermediate code blocks may be lost.

You can also trigger compaction manually with the /compact command:

# Manual compaction with summary instruction
$ claude
> /compact "Preserve only the modifications on auth/ and the tests"

The optional parameter after /compact guides the summary. Specify the critical elements to preserve so that compaction keeps what matters for your current task.

To avoid common compaction mistakes, read the common context management mistakes that covers pitfalls to avoid.

Key takeaway: compaction intelligently summarizes your session - trigger it manually with /compact to control what is preserved.

What are PreCompact hooks and how do you configure them?

PreCompact hooks are shell scripts executed automatically just before each compaction. Configure them in your .claude/settings.json file to save the critical state of your session.

A typical PreCompact hook saves the current diff, git state, or a custom summary in a temporary file that you can re-read after compaction.

{
  "hooks": {
    "PreCompact": [
      {
        "command": "git diff --stat > /tmp/claude-pre-compact-diff.txt",
        "timeout": 5000
      }
    ]
  }
}

In practice, PreCompact hooks solve a common problem: losing track of uncommitted changes after a compaction. By saving the diff before compaction, you create a safety net.

HookObjectiveRecommended timeout
git diff --statSave change summary5,000 ms
git stash listList active stashes3,000 ms
Custom summary scriptExport critical context10,000 ms

To go further on hook and slash command configuration, consult the essential slash commands FAQ. You will also find advanced tips in the context management tips.

Key takeaway: configure a PreCompact hook to automatically save your git diff before each compaction - it is your safety net.

How to use multi-sessions to scale horizontally?

Launch multiple Claude Code instances in parallel, each dedicated to a specific task, to multiply your processing capacity without saturating a single context window.

Horizontal scaling involves distributing work across multiple independent sessions rather than loading everything into a single 200,000-token window. Each session has its own token budget.

# Terminal 1: refactoring the auth module
$ claude "Refactor src/auth/ to use JWT tokens"

# Terminal 2: writing tests
$ claude "Write unit tests for src/api/users.ts"

# Terminal 3: documentation
$ claude "Generate JSDoc documentation for src/lib/"

In practice, 3 parallel sessions handle a work volume equivalent to 600,000 tokens of total context. horizontal scaling reduces completion time for complex tasks by 55% compared to a single session.

The CLAUDE.md file serves as shared memory between sessions. Each instance loads it at startup, ensuring code convention consistency.

Key takeaway: distribute independent tasks across 2 to 4 parallel sessions to multiply your capacity without compromising context quality.

Can you recover context lost after compaction?

No, tokens removed during compaction are not directly recoverable. Use the /resume command and PreCompact hooks to mitigate this limitation.

Compaction is a destructive process: the summary replaces the original exchanges. However, three mechanisms protect you. The CLAUDE.md file persists across sessions. PreCompact hooks save critical state. And the /resume command lets you resume a previous session with its summarized context.

# Resume the last session
$ claude --resume

# List recent sessions
$ claude --list-sessions

To understand how the CLAUDE.md memory system complements context management, consult the dedicated FAQ. This persistent memory survives compactions and session changes.

Key takeaway: prepare for compaction with hooks and CLAUDE.md - once tokens are compacted, only the summary remains.

Which files consume the most tokens and how to identify them?

Generated files (lock files, bundles, maps) often consume more than 50,000 tokens each. Add them to your .claudeignore to automatically exclude them from context.

A typical package-lock.json file represents 30,000 to 80,000 tokens. A compiled bundle.js file can exceed 100,000 tokens - half of your context window consumed by a single useless file.

# .claudeignore - files to exclude from context
node_modules/
*.lock
dist/
build/
*.min.js
*.map
coverage/
File typeAverage tokensRecommended action
package-lock.json40,000.claudeignore
bundle.js80,000+.claudeignore
Source file (200 lines)1,600Read if needed
Test file (150 lines)1,200Read if needed
.env file100Never expose

To understand what agentic coding is and why file management is central to this paradigm, consult the dedicated FAQ.

Key takeaway: a well-configured .claudeignore can save 50% of your token budget by excluding generated files.

How to combine CLAUDE.md and context management for productive sessions?

Write a concise CLAUDE.md file (under 200 lines) that automatically loads critical conventions without wasting tokens. This file is Claude Code's persistent memory across sessions.

CLAUDE.md is loaded at the start of every session and consumes between 500 and 2,000 tokens depending on its size. A file that is too long wastes context. A file that is too short forces the agent to explore the project to rediscover conventions.

# CLAUDE.md - optimized example (< 800 tokens)
## Stack
- Next.js 15, TypeScript 5.7, Tailwind CSS 4.0
## Conventions
- Tests with Vitest, naming: *.test.ts
- Conventional commits (feat:, fix:, chore:)
## Commands
- npm run dev: local server
- npm run test: run tests

SFEIR Institute offers a one-day Claude Code training that includes a hands-on lab on creating an optimized CLAUDE.md file and context management techniques. You will learn how to configure your environment to maximize agent productivity.

Key takeaway: keep your CLAUDE.md under 200 lines and 2,000 tokens to load essential conventions without wasting your context window.

Should you prefer one long session or several short sessions?

Prefer short, focused sessions of 30 to 45 minutes to maintain a clean context and precise responses. Long sessions accumulate noise in the context.

A 2-hour session often reaches 2 to 3 compactions, which progressively dilutes summary quality. Conversely, 30-minute sessions generally stay under 80,000 tokens, well below the compaction threshold.

Session durationTokens consumedCompactionsContext quality
15 min15,000-30,0000Excellent
30 min40,000-80,0000-1Good
1 h80,000-150,0001-2Average
2 h+150,000+2-3+Degraded

For developers who want to deepen these techniques and practice on real cases, the AI-Augmented Developer training from SFEIR (2 days) covers context optimization among other advanced skills for integrating AI into the development workflow.

Key takeaway: 30 to 45-minute sessions with precise objectives produce better results than a marathon session.

How does the MCP protocol interact with the context window?

Each call to an MCP (Model Context Protocol) server injects its response into the context window. Monitor MCP tools that return large volumes of data to avoid saturating your token budget.

MCP allows Claude Code to communicate with external services: databases, APIs, remote file systems. Each MCP tool result consumes tokens proportional to the response size.

An MCP tool returning 500 lines of SQL results consumes approximately 4,000 tokens. Limit results with LIMIT clauses or filters to control consumption.

To configure and understand the MCP protocol in detail, refer to the MCP: Model Context Protocol FAQ. If you are getting started with Claude Code, begin with the installation and first launch FAQ.

To master advanced interactions between MCP, context, and agents, the AI-Augmented Developer - Advanced training from SFEIR (1 day) offers dedicated labs on orchestrating MCP servers in complex workflows.

Key takeaway: every MCP response consumes tokens - filter and limit results to preserve your context window.

Is there a cost difference between input and output tokens?

Yes, output tokens cost 5 times more than input tokens with Claude Opus 4 (2026 version). Optimize your prompts to reduce the length of generated responses.

Anthropic pricing for Claude Opus 4 is $15 per million input tokens and $75 per million output tokens. For Claude Sonnet 4.6, pricing is $3 and $15 respectively.

ModelInput tokens (1M)Output tokens (1M)Ratio
Claude Opus 4$15$751:5
Claude Sonnet 4.6$3$151:5
Claude Haiku 4.5$0.80$41:5

In practice, a session of 100,000 input tokens and 20,000 output tokens on Opus 4 costs approximately $3 ($1.50 + $1.50). Plan mode, which limits output tokens, significantly reduces this cost.

Key takeaway: output tokens cost 5 times more - Plan mode and concise prompts are your best allies for controlling the bill.

How to debug a session where the context seems "polluted"?

Launch a new session with claude --resume or start a clean session when agent responses lose relevance. A polluted context manifests as hallucinations or repetitions.

Three signals indicate a polluted context: the agent repeats already-given information, mixes files from different modules, or applies outdated conventions from a previous exchange. These symptoms generally appear after 2 or more compactions.

# Start a clean session
$ claude

# Or resume with a summarized context
$ claude --resume

# Manually compact targeting useful information
> /compact "Preserve only the modifications on the payments/ module"

In practice, starting a new targeted session takes 30 seconds and saves you 10 to 15 minutes of imprecise responses. It is the most cost-effective reflex in context management. The context management tips offer other diagnostic techniques.

Key takeaway: when responses degrade, start a clean session rather than fighting a polluted context - it is faster and more reliable.


Recommended training

Claude Code Training

Master Claude Code with our expert instructors. Practical, hands-on training directly applicable to your projects.

View program