Optimizing AI Coding Agents: A Case Study in 65% Token Reduction
These articles are AI-generated summaries. Please check the original sources for full details.
How I Cut My AI Coding Agent’s Token Usage by 65% (Without Changing Models)
Nicola Alessi successfully reduced Claude Code input tokens from 8,200 to 2,100 on a 200-file TypeScript project. The optimization focused on replacing broad grep searches with precise AST-level dependency mapping to eliminate redundant file reads.
Why This Matters
Technical debt in AI context management results in agents spending up to 80% of their token budget on ‘orientation’ rather than actual coding tasks. By failing to provide structural context, developers face 2-3x higher costs and slower session starts as agents rediscover the same logic every session, hitting usage caps prematurely.
Key Insights
- Specific documentation in CLAUDE.md focusing on ‘decisions, not descriptions’ yields a 20% token reduction (Nicola Alessi, 2026).
- Replacing grep-based searches with AST-level subgraphs reduced relevant file reads from 40 down to 5 (Nicola Alessi, 2026).
- Passive observation of tool calls and code changes effectively solves the ‘amnesia’ problem where agents forget discoveries between sessions.
- Local dependency mapping using tools like vexp (Rust-based) ensures zero-network overhead and data privacy while maintaining context.
- Stale observation tracking ensures that as code evolves, linked knowledge is automatically invalidated to prevent feeding the agent outdated context.
Working Examples
Example of a high-signal CLAUDE.md focusing on specific architectural decisions.
## Auth
- Auth uses middleware in src/auth/middleware.ts
- JWT tokens, not sessions. Refresh token rotation in src/auth/refresh.ts
- DO NOT touch src/auth/legacy.ts — deprecated, will be removed Q2
## Database
- Prisma ORM, schema in prisma/schema.prisma
- All migrations must be backward-compatible
Installation command for the vexp CLI to enable dependency graph mapping for agents.
npm install -g vexp-cli
Practical Applications
- Implementing specific architectural constraints in CLAUDE.md for TypeScript/Express projects to guide Claude Code. Pitfall: Using vague descriptions like ‘follow best practices’ which forces the agent to read the whole codebase to define ‘best.’
- Integrating the Model Context Protocol (MCP) with tools like Cursor or Windsurf to provide AST graphs. Pitfall: Letting agents grep ‘auth’ across the codebase, resulting in 40+ hits and 8,000 wasted tokens.
- Using passive memory tools to link observations to a code graph. Pitfall: Asking the agent to manually save notes, which has zero value to the current context window and results in low compliance.
References:
Continue reading
Next article
Benchmark: AVIF Achieves 91% Compression in WordPress Image Optimization Test
Related Content
Solving AI Agent Amnesia with MCP-Based Persistent Memory
AI coding agents suffer from session amnesia that leads to repetitive architectural errors; using a persistent MCP knowledge graph provides a reusable memory layer.
EGC: Persistent Memory for AI Coding Tools via MCP Servers
EGC implements cross-tool persistent memory for AI coding assistants, reducing session context overhead from 1,500 to 200 tokens.
Engineering Safe AI Agents: Why the First Paid Call Must Be Boring
Reduce AI agent risk by implementing five boring constraints—routes, budget owners, credential rails, denied neighbors, and receipts—before scaling spend.