Prelude

Getting the first Claude Code bill is often a moment of genuine surprise. Not because it is outrageous, but because it is unclear where the tokens went. A month of happily chatting away, asking Claude to read entire directories, rewriting the same file three times because prompts were vague, and letting context windows balloon to 200K tokens without a second thought, adds up fast.

That first bill is a wake-up call. Not because the tool is not worth the money. It absolutely is. But because a significant portion of spend goes to habits that are easy to fix.

Vague prompts that lead to back-and-forth. Reading files that are not needed. Keeping stale context alive across unrelated tasks. Using the most expensive model for every trivial question.

Over months of refinement, we developed a set of practices that cut effective costs by roughly 60% without reducing productivity. In many cases, the cost-saving practices actually improved productivity because they forced clearer thinking and better session management.

This guide is everything we have learned about spending less on Claude Code while getting more out of it.

The Problem

Claude Code is priced on token consumption. Every character you send as input and every character Claude generates as output has a cost. For individual developers on Pro or Max plans, this means working within monthly limits. For teams on API-based pricing, this means real dollar amounts on every invoice.

The challenge is that Claude Code makes it very easy to consume tokens without realising it. Reading a large file adds thousands of input tokens. A long conversation accumulates context that is re-sent with every message. Using Claude Opus for a simple file rename costs ten times more than using Claude Haiku for the same operation.

Most developers fall into one of two camps. Either they do not think about cost at all and are surprised by their usage, or they think about it too much and restrict their usage to the point where Claude Code stops being useful.

Neither extreme is correct. The goal is to be intentional about token usage without being stingy. To use the right model for each task, manage context deliberately, and structure prompts so that Claude accomplishes your goal in as few turns as possible.

The Journey

How Claude Code Billing Works

Before you can optimise costs, you need to understand how billing works. Claude Code charges based on tokens, which are roughly four characters each. There are two types.

Input tokens are everything you send to Claude. This includes your prompt, the conversation history, any files Claude has read, the contents of your CLAUDE.md, tool results, and system prompts.

Input tokens are the larger cost driver for most users because context accumulates over a session.

Output tokens are everything Claude generates. This includes its responses, code it writes, and commands it suggests. Output tokens cost more per token than input tokens, but you typically generate fewer of them.

For reference, as of early 2026, the approximate API pricing is as follows.

Model Input (per 1M tokens) Output (per 1M tokens)
Claude Opus 4.6 $15 $75
Claude Sonnet 4.6 $3 $15
Claude Haiku 4.5 $0.80 $4

The ratio matters. Opus output tokens cost nearly 19 times more than Haiku output tokens.

A task that generates 5,000 output tokens costs $0.375 with Opus and $0.02 with Haiku. Over hundreds of tasks per month, these differences compound significantly.

For subscription users (Pro at $20/month, Max at $100 or $200/month), you are not paying per token directly, but you have usage limits. The same optimisation strategies help you stay within those limits and avoid throttling or rate caps.

Understanding Your Usage

You cannot optimise what you do not measure. Claude Code provides several ways to understand your token consumption.

The /cost command shows your current session's token usage and estimated cost. Running this at the end of every significant session builds intuition about what different task types cost.

> /cost
Session tokens: 145,230 input, 12,450 output
Estimated cost: $3.11 (Opus)

Session summaries appear when you end a session, showing total tokens consumed and the cost breakdown. Pay attention to these. They tell you whether a session was efficient or wasteful.

Monthly usage tracking is available through your account dashboard. Review this weekly, not monthly.

By the time you see a monthly bill, you have already spent the money. Weekly reviews let you spot patterns and adjust before they become expensive habits.

The single most useful metric is cost per task. Not cost per session or cost per day.

Track what you accomplish in each session and divide the cost by the number of meaningful tasks completed. This tells you whether you are using Claude Code efficiently.

Model Selection Strategy

The most impactful cost optimisation is choosing the right model for each task. Most developers default to the most powerful model available and never switch. This is like driving a lorry to the corner shop.

Claude Opus is the most capable and most expensive model. Use it for tasks that require deep reasoning, complex refactoring across multiple files, architectural decisions, debugging subtle issues, and any task where getting it right the first time matters more than cost.

Claude Sonnet is the balanced middle ground. Use it for routine development work, writing new functions, creating tests, reviewing code, and any task that is moderately complex but does not require Opus-level reasoning. Sonnet handles 80% of daily development work at one-fifth the cost of Opus.

Claude Haiku is the fastest and cheapest model. Use it for simple queries, quick lookups, formatting tasks, generating boilerplate, and any task that does not require deep understanding. Haiku is excellent for questions like "what does this error mean" or "generate a TypeScript interface from this JSON."

The /model command lets you switch models mid-session.

> /model sonnet
Switched to Claude Sonnet

> /model opus
Switched to Claude Opus

A good habit is starting every session on Sonnet and only switching to Opus when hitting a task that Sonnet struggles with. This single habit can reduce costs by roughly 40%.

For a complete look at integrating model switching into your daily work, our guide on daily workflows and productivity covers this in more depth.

Context Management

Context is the hidden cost driver in Claude Code. Every message in your conversation is re-sent as input tokens with every new prompt. A conversation that starts at 5,000 tokens of context grows to 50,000 tokens after several exchanges, and keeps growing.

The most important context management tool is /clear. This command resets your conversation, starting fresh with only your CLAUDE.md and system prompt as context. Use it whenever you switch tasks.

A common mistake is keeping a single session running all day, asking Claude about authentication one minute and CSS styling the next. The authentication context is still being sent as input tokens during CSS questions. Every prompt about CSS is also paying for the authentication discussion that is no longer relevant.

Use /clear aggressively. Finished a task? Clear. Switching to a different part of the codebase? Clear.

Context getting long and responses getting slow? Clear.

The rule is simple. If the previous conversation is not relevant to the next question, clear the context. The few seconds it takes to re-establish context is far cheaper than carrying irrelevant tokens through every subsequent prompt.

Effective Prompting

Vague prompts are expensive prompts. When you tell Claude "fix the authentication," it needs to explore, ask clarifying questions, try different approaches, and potentially rework its solution when you provide more details. Every exchange adds tokens.

Specific prompts are cheap prompts. When you tell Claude "in src/auth/middleware.rs, the validate_token function is not checking token expiration. Add a check that compares the exp claim against the current timestamp and returns a 401 if expired," Claude can accomplish the task in a single turn.

Here are recommended practices for cost-effective prompting.

Name specific files. Instead of "fix the bug in the login page," say "fix the null pointer in src/pages/login.tsx on line 45." Claude does not need to search for the file, which saves both time and tokens.

State the desired outcome. Instead of "make this better," say "refactor this function to use early returns instead of nested if statements." Claude does not need to guess what "better" means.

Provide relevant context up front. If Claude needs to know about your database schema to write a query, paste the relevant schema excerpt in your prompt. Do not make Claude read the schema file. You control exactly how many tokens are spent on context.

Avoid open-ended exploration. Instead of "explore the codebase and tell me what you find," say "read src/lib.rs and list the public modules." Bounded questions get bounded answers.

The difference between a three-turn conversation and a one-turn solution can be 50,000 tokens. At Opus pricing, that is roughly $1 saved on a single task. Multiply by dozens of tasks per day and the savings are substantial.

Using /compact Effectively

The /compact command is one of Claude Code's most useful cost management features. It summarises the current conversation into a condensed form, reducing the context size that is sent with subsequent prompts.

When to use /compact depends on your workflow. Two situations stand out.

First, after a long exploratory conversation involving reading files and explanations. By the time changes are ready to be made, the context is full of file contents and explanations that are no longer needed. Running /compact distils the conversation into a summary, and subsequent editing prompts carry far less context.

Second, when Claude's responses become slower. Large contexts take longer to process, so sluggish responses are a signal that context has grown too large. A quick /compact brings things back to a manageable size.

The key insight is that /compact does not lose important information. It summarises the conversation, preserving the decisions made and the current state of work.

What it discards is the verbatim file contents, intermediate reasoning, and other details that Claude no longer needs.

CLAUDE.md Optimisation

Your CLAUDE.md file is included in every prompt as input tokens. If your CLAUDE.md is 500 lines of detailed instructions, you are paying for those 500 lines with every single message you send. Over a day of active use, this adds up.

The Claude Code documentation recommends keeping your CLAUDE.md under ~500 lines. Aim for under 400. Every line should earn its place by meaningfully improving Claude's behaviour.

Here are the optimisation strategies that work best.

Front-load critical information. The most important instructions should be at the top. If Claude's context window is under pressure, the beginning of CLAUDE.md is more likely to be retained than the end.

Remove stale instructions. Review your CLAUDE.md monthly. Delete anything that refers to completed features, resolved issues, or outdated conventions. It is not uncommon to find instructions about a database migration that was completed six months earlier, still being sent with every prompt.

Be concise. Instead of "When writing TypeScript code, please make sure to always use strict type checking and never use the any type unless absolutely necessary because it undermines the benefits of TypeScript's type system," write "Use strict TypeScript types. Avoid any." Same instruction, one-fifth the tokens.

Use CLAUDE.md for patterns, not procedures. Long step-by-step procedures belong in skills (.claude/commands/ files), which are only loaded when invoked. CLAUDE.md should contain rules and conventions that apply to every interaction.

The automatic caching of CLAUDE.md contents is a significant cost benefit. Because the file is sent with every prompt, Claude Code caches it after the first message.

Subsequent messages get a 90% discount on the CLAUDE.md input tokens. This is another reason to keep CLAUDE.md stable and avoid frequent changes during a session.

Prompt Caching

Prompt caching is one of the most significant cost-saving features in the Claude API, and Claude Code applies it automatically. When the same text appears at the beginning of consecutive requests, it is cached and subsequent uses receive a 90% discount on input token costs.

This happens automatically for your CLAUDE.md file, system prompts, and the early portions of your conversation. You do not need to configure anything. But you can structure your workflow to maximise cache hits.

Keep CLAUDE.md stable during sessions. If you edit CLAUDE.md mid-session, the cache is invalidated and you pay full price for the updated contents. Make your CLAUDE.md edits between sessions, not during them.

Start conversations with consistent context. If you frequently need Claude to understand your project structure, put that information in CLAUDE.md rather than pasting it into each prompt. Information in CLAUDE.md is cached. Information pasted into prompts is not.

Use skills for repeated prompts. If you find yourself typing the same instructions repeatedly, create a skill file. While skills themselves are not cached in the same way, the consistent structure they provide helps you avoid the token waste of re-typing instructions.

The 90% discount on cached tokens is enormous. On a typical day, prompt caching saves an estimated 40-50% on input token costs compared to what would be paid without it.

File Reading Efficiency

Every file Claude reads becomes part of the conversation context. A 1,000-line source file is roughly 10,000 tokens.

Reading ten files adds 100,000 tokens to your context. At Opus pricing, that is $1.50 just for reading files.

A common wasteful habit is asking Claude to "look at the project structure" or "read the relevant files." Claude dutifully reads a dozen files, most of which are not needed for the actual task.

A better approach is to follow a strict protocol. Before asking Claude to read files, use grep and glob to identify exactly which files are relevant. Then ask Claude to read only those specific files.

> Read src/auth/middleware.rs and fix the token expiration check

Not this.

> Look through the auth module and find and fix the token bug

The first prompt reads one file. The second prompt might read five or ten files before finding the right one. The token difference is significant.

For large files, consider whether Claude needs the entire file or just a portion. If you know the bug is on line 45, tell Claude to focus on that area. Less context means fewer tokens and often better results, because Claude is not distracted by irrelevant code.

Batch Operations

Grouping related changes into a single prompt is more efficient than making them one at a time. Each separate prompt carries the full context overhead. Five separate prompts about five related changes cost roughly five times more than a single prompt that addresses all five.

Here is an example. Instead of five separate prompts asking Claude to add error handling to five different functions, write one prompt.

Add error handling to the following functions in src/api/handlers.rs:
1. create_user - handle duplicate email errors
2. update_user - handle not found errors
3. delete_user - handle foreign key constraint errors
4. list_users - handle pagination out of range
5. get_user - handle not found errors

Use the AppError type from src/errors.rs for all error returns.

Claude handles all five in a single turn, with a single context load. The savings scale with the number of related changes.

Planning work in batches pays off. Before starting a Claude Code session, list the changes needed.

If several changes are in the same area of the codebase, group them into a single prompt. This takes a minute of planning and can save thousands of tokens.

Subagents for Research

Claude Code's subagent tool delegates tasks to a separate context window. This is powerful for cost management because the subagent's context is independent of your main conversation.

When you need Claude to research something, the subagent reads files, searches the codebase, and returns a summary to your main context. Your main context only receives the summary, not all the files the subagent read.

Consider the difference. If you ask Claude to "find all places where we handle authentication errors and summarise the patterns," Claude might read 15 files in your main context, adding 150,000 tokens.

With a subagent, those 15 files are read in a separate context. Your main context receives a 500-token summary.

Use subagents for codebase exploration, pattern analysis, dependency tracking, and any research task where you need a summary rather than the raw data.

Enterprise Cost Controls

For teams and enterprises, cost management extends beyond individual practices. The enterprise managed settings system provides organisational controls that prevent runaway costs.

Spending limits can be set per user, per team, or per project. When a limit is reached, usage is throttled or paused until the next billing cycle. This prevents any single developer or project from consuming a disproportionate share of the budget.

Usage dashboards provide visibility into who is spending what and on which projects. Review these weekly with your team leads.

Identify developers whose usage is unusually high or low. High usage might indicate inefficient habits that coaching can fix. Low usage might indicate that developers are not getting enough value from the tool.

Model restrictions can limit which models are available for different contexts. You might allow Opus only for senior developers or specific project types, while defaulting everyone else to Sonnet. This ensures that the most expensive model is used only when its capabilities are genuinely needed.

Approved plugins and MCP servers affect costs indirectly. Some tools are chatty, making many API calls or returning large responses. Controlling which tools are available helps manage the token overhead they introduce. For a breakdown of which plugins deliver the best value, see our guide on the best Claude Code plugins in 2026.

The most effective enterprise cost strategy is not restriction but education. Teams that understand how token costs work and have visibility into their usage naturally optimise. Teams that are simply given limits without context tend to either ignore the tool or resent the constraints.

Real Cost Examples

To make the abstract concrete, here are typical costs for different task types. These assume API pricing with Claude Sonnet unless noted.

Quick question (e.g. "what does this error mean"): 2,000-5,000 input tokens, 500-1,000 output tokens. Cost with Sonnet is roughly $0.02. With Haiku, it would be roughly $0.006.

Single file edit (e.g. "add error handling to this function"): 10,000-20,000 input tokens (including file contents), 2,000-5,000 output tokens. Cost with Sonnet is roughly $0.10.

Multi-file refactoring (e.g. "rename this API and update all callers"): 50,000-100,000 input tokens, 10,000-20,000 output tokens. Cost with Sonnet is roughly $0.45. This is where Opus might be worth the premium if the refactoring is complex.

Full feature implementation (e.g. "add user preferences with database, API, and UI"): 100,000-200,000 input tokens, 30,000-50,000 output tokens. Cost with Sonnet is roughly $1.05. With Opus, roughly $5.25. Over a long session with multiple turns, these can double or triple.

Codebase exploration (e.g. "understand the authentication system"): 150,000-300,000 input tokens, 5,000-10,000 output tokens. Cost with Sonnet is roughly $0.79. This is where subagents provide the most value, as they keep the large context out of your main session.

A productive developer using Claude Code full-time with good habits typically uses $5-15 per day on API pricing. Without good habits, the same work might cost $20-40 per day. The optimisation strategies in this guide close that gap.

Building a Cost-Conscious Workflow

Pulling everything together, here is a recommended daily workflow.

Morning. Start a fresh session. Review the tasks for the day. Plan which tasks can be batched together. Set the model to Sonnet.

Per task. Clear the context with /clear before each new task. Use specific, detailed prompts. Name the files involved. Switch to Opus only for genuinely complex tasks, then switch back to Sonnet when done.

Mid-session. Run /compact if the context is growing large. Check /cost periodically to stay aware of usage. Use subagents for research and exploration.

End of day. Review the session cost. Note any tasks that were unusually expensive and think about why. Adjust CLAUDE.md if you notice patterns that could be addressed with better instructions.

This is not a rigid process. It is a set of habits that, once internalised, run on autopilot. The savings compound over time, and the discipline of thinking about context management actually makes you more productive.

The Lesson

Cost optimisation in Claude Code is not about using the tool less. It is about using it more deliberately. The developers who spend the least per task are not the ones who restrict their usage. They are the ones who manage context, choose models intentionally, write specific prompts, and clear sessions between unrelated tasks.

The three highest-impact practices are model selection (use Sonnet as default, Opus only when needed), context management (clear between tasks, compact when context grows), and specific prompting (name files, state outcomes, avoid open-ended exploration). Together, these three practices account for roughly 80% of achievable savings.

The remaining 20% comes from CLAUDE.md optimisation, file reading discipline, batch operations, and subagent usage. These are refinements that build on the foundation of the three core practices.

Conclusion

Looking back at the first month of Claude Code usage compared to today, the difference is striking. More work gets accomplished now with roughly half the tokens. The tool has not changed. The habits have.

The strategies in this guide are not theoretical. They are practices refined over months of daily Claude Code usage.

Start with the high-impact changes. Default to Sonnet. Clear between tasks. Write specific prompts. Those three changes alone will likely reduce your costs by 30-40%.

Then layer in the refinements. Optimise your CLAUDE.md. Use subagents for research. Batch related changes. Check your costs with /cost and build intuition about what different task types should cost.

The goal is to spend your token budget on work that matters, not on carrying stale context, using expensive models for trivial tasks, or going back and forth because a prompt was unclear. Every token should earn its place.