How to Optimize Context in Claude Code
Claude Code's context window is finite, and how you use it directly affects both cost and quality

Claude Code's context window is finite, and how you use it directly affects both cost and quality. A bloated CLAUDE.md, context pollution from large tasks, and re-loaded boilerplate all eat into the space Claude has to actually reason about your code.
Here's how to keep context lean and effective.
What consumes context in Claude Code?
Three main things share the context window:
- CLAUDE.md — loaded at every session start, consuming tokens alongside your conversation
- The conversation itself — everything said, plus files Claude reads
- Tool output — results from searches, file reads, command runs
The window is large but not infinite. For Claude-backed sessions, the model limit is around 200,000 tokens. Everything competes for that space, so what you load by default matters.
How do you keep CLAUDE.md efficient?
Target under about 200 lines as a practical heuristic (not an official hard limit). Longer files don't just cost more tokens — they measurably reduce how reliably Claude follows instructions. Treat every line as precious.
Three techniques:
- Path-scoped rules. Instead of one giant file, scope instructions to load only when Claude works with matching files. Module-specific rules live in subdirectory CLAUDE.md files and stay out of context until relevant.
- Progressive disclosure. Keep the main file lean and point Claude to deeper docs only when needed. Don't inline your entire architecture guide.
- Don't use it as a linter. Deterministic tools — linters, formatters, hooks — handle enforcement faster and more reliably than instructions in CLAUDE.md. Save the file for context only Claude needs.
How does prompt caching affect cost?
Claude Code applies prompt caching to CLAUDE.md, which changes the cost math. The first request in a session pays full input-token price for the file. Subsequent requests within a short window hit the cache and bill at a much lower cache-read rate. (Exact cache timing is managed by Claude Code and can change — don't hard-code assumptions around it.)
Any change to CLAUDE.md invalidates the cache, and the next request pays full price again. In practice, a sizeable CLAUDE.md costs full tokens roughly once per session rather than once per message — the point is that a stable file is cheap to keep loaded. The takeaway: a stable, well-sized CLAUDE.md is cheap to keep loaded. One you edit constantly mid-session costs more.
How do you avoid context pollution on big tasks?
Large tasks can fill the window with exploration that crowds out the actual work. Two tools help:
- Plan mode (
/plan) scopes a design phase before implementation, so exploration doesn't bloat the working context. - Subagents explore parts of the codebase in parallel and report back, keeping their context separate from your main session.
Compaction also runs automatically when you approach the limit, compressing older context transparently. You no longer need to monitor usage or clear sessions manually the way 2025 workflows required.
What context shouldn't you be loading manually?
Here's an optimization most people miss: a lot of what gets re-pasted into prompts and CLAUDE.md files every session isn't project context at all. It's the same standing context — who you are, your conventions, the client you're working for — loaded by hand, over and over.
That's wasted effort and wasted tokens. Context that's stable across sessions and projects shouldn't be manually re-entered each time; it should load from a structured layer that delivers exactly what's needed. Connect your context vault once, and the foundational layer arrives automatically — leaving CLAUDE.md to do what it's best at: project-specific instructions, kept lean.
Optimizing context isn't only about trimming files. It's about not hand-carrying the same information into every session in the first place.
→ How context delivery works: How to Deliver Personal Context to AI Tools
→ Set up your context vault with Unabyss →