Overview
Extended thinking allows the LLM to engage in deeper reasoning before responding. When enabled, the model thinks through problems step-by-step internally, and you can see the thinking process in real-time via streaming.Extended thinking is supported by both AnthropicProvider and GeminiProvider.
Enabling Extended Thinking
| Problem Type | Recommended Budget |
|---|---|
| Moderate complexity | 8,000-16,000 |
| Complex reasoning | 16,000-32,000 |
| Very difficult problems | 32,000-64,000 |
Processing Thinking Output
When extended thinking is enabled, you receive additional chunk types:ContentBlock::Thinking.
Usage Tracking
Thinking tokens are reported separately in the usage metadata:Gemini models: Thinking tokens are counted as part of the response. You’re charged for both output tokens and thinking tokens.
Provider-Specific Behavior
AnthropicProvider
- Uses the thinking budget directly as token count
- Supports budgets from 1 to 64,000 tokens
GeminiProvider
- Gemini 3 models (gemini-3-flash, gemini-3-pro):
- Budget automatically converted to thinking levels (
minimal,low,medium,high) - Conversion ranges:
0-512→minimal513-2048→low2049-8192→medium8193+→high
- Budget automatically converted to thinking levels (
- Gemini 2.5 models (gemini-2.5-flash, gemini-2.5-pro):
- Uses numeric budget directly (0-24,576 tokens)
- Set to
0to disable thinking
Limitations
- Non-streaming mode: Interrupts are not detected during the thinking phase when streaming is disabled.
- Budget is a maximum: The model may use fewer tokens than the budget for simpler problems.
- Tool calls: Extended thinking applies to LLM responses, not during tool execution.
Next Steps
Prompt Caching
Reduce costs with caching
Streaming & History
Handle thinking output