Measurable time, cost, and token savings derived from 10+ production systems. Every number backed by code.
| Technique | Savings | How it works | Applied to |
|---|---|---|---|
| Thinking Budget Control | 40-50% | Disabled thinking tokens on structured JSON calls | Job Agent, AI Orchestrator |
| Transcript Trimming | 8-25K tokens/call | Capped document reads at 20K chars, video transcripts at reasonable limits | AI Orchestrator |
| Lazy KB Loading | 200MB memory | ChromaDB loaded on first query, not at startup | AI Orchestrator |
| Cached Digest | 30-60s per request | Firestore-cached action digest, refreshed every 30 min | AI Chat Bot |
| Model Downgrade | 97% cost | Flash produces identical quality at 1/4 price, 3x faster | AI Orchestrator, Job Agent |