Overview
Prompt caching reuses previously processed prompt content, reducing API costs and improving latency. The SDK enables this automatically for all agents.Prompt caching is enabled by default. No configuration needed.
Cost Savings
- Cache read tokens: 10% of regular price (90% discount)
- Cache write tokens: 25% more than regular price
- Net effect: Savings increase with conversation length, reaching 75%+ for 10+ turn conversations
How It Works
The SDK automatically adds cache breakpoints at three locations:- System Prompt — your agent’s instructions (static)
- Tool Definitions — all tool schemas (rarely change)
- Conversation History — the growing message history (last content block)
When Caching Helps Most
- Long conversations: More turns = more savings
- Many tools: Large tool definitions cached once
- Batch processing: Multiple similar requests in same session
- Active sessions: Continuous conversation within 5 minutes
Usage
Caching is enabled by default:Monitoring
Enable debug mode to see cache metrics:Limitations
- Minimum cacheable prefix: 1024 tokens
- Cache expiry: 5 minutes of inactivity
- Dynamic prompts: Changing the system prompt or tools between calls invalidates the cache
- Short conversations: Overhead may exceed savings for single-turn interactions
Next Steps
Streaming & History
Real-time output patterns
AgentConfig API
All configuration options