Prompt Caching

Overview

Prompt caching reuses previously processed prompt content, reducing API costs and improving latency. The SDK enables this automatically for all agents.

Prompt caching is enabled by default. No configuration needed.

Cost Savings

Cache read tokens: 10% of regular price (90% discount)
Cache write tokens: 25% more than regular price
Net effect: Savings increase with conversation length, reaching 75%+ for 10+ turn conversations

How It Works

The SDK automatically adds cache breakpoints at three locations:

System Prompt — your agent’s instructions (static)
Tool Definitions — all tool schemas (rarely change)
Conversation History — the growing message history (last content block)

Cache entries are valid for 5 minutes. Each cache read refreshes the timer.

When Caching Helps Most

Long conversations: More turns = more savings
Many tools: Large tool definitions cached once
Batch processing: Multiple similar requests in same session
Active sessions: Continuous conversation within 5 minutes

Usage

Caching is enabled by default:

let config = AgentConfig::new("You are helpful")
    .with_tools(tools);  // Caching automatically enabled

let agent = StandardAgent::new(config, llm);

To disable (not recommended):

let config = AgentConfig::new("You are helpful")
    .with_prompt_caching(false);

Monitoring

Enable debug mode to see cache metrics:

let config = AgentConfig::new("You are helpful")
    .with_debug(true);

Output:

Cache creation tokens: 5000
Cache read tokens: 12000
Input tokens: 500

Access metrics programmatically from the LLM response:

let response = llm.send_message(...).await?;

println!("Cache creation: {}", response.usage.cache_creation_input_tokens);
println!("Cache read: {}", response.usage.cache_read_input_tokens);
println!("Regular input: {}", response.usage.input_tokens);

Limitations

Minimum cacheable prefix: 1024 tokens
Cache expiry: 5 minutes of inactivity
Dynamic prompts: Changing the system prompt or tools between calls invalidates the cache
Short conversations: Overhead may exceed savings for single-turn interactions

Keep system prompts static and tool sets consistent to maximize cache hits. Avoid including timestamps or dynamic content in your system prompt.

Get Started

Core Concepts

Features

Tools

Permissions

Hooks

MCP

LLM Providers

Advanced

Integration

Overview

Cost Savings

How It Works

When Caching Helps Most

Usage

Monitoring

Limitations

Next Steps

Streaming & History

AgentConfig API

​Overview

​Cost Savings

​How It Works

​When Caching Helps Most

​Usage

​Monitoring

​Limitations

​Next Steps

Streaming & History

AgentConfig API

Overview

Cost Savings

How It Works

When Caching Helps Most

Usage

Monitoring

Limitations

Next Steps