Measurable time, cost, and token savings derived from 10+ production systems. Every number backed by code.
| Technique | Savings | How it works | Applied to |
|---|---|---|---|
| Thinking Budget Control | 40-50% | Disabled thinking tokens on structured JSON calls | Job Agent, AI Orchestrator |
| Transcript Trimming | 8-25K tokens/call | Capped document reads at 20K chars, video transcripts at reasonable limits | AI Orchestrator |
| Lazy KB Loading | 200MB memory | ChromaDB loaded on first query, not at startup | AI Orchestrator |
| Cached Digest | 30-60s per request | Firestore-cached action digest, refreshed every 30 min | AI Chat Bot |
| Model Downgrade | 97% cost | Flash produces identical quality at 1/4 price, 3x faster | AI Orchestrator, Job Agent |
| Smart File Read | 99% tokens | Server-side regex search returns only matching sections, never entire files | Multi-Agent Orchestrator |
| Headline Caching | 15x fewer requests | News headlines cached 15 min between 60s trading ticks | Crypto Trading Agent |
| Zero-Cost AI Routing | 100% cost | OpenRouter free tier handles routine tasks, Claude reserved for complex work | Private AI Gateway |