Prompt caching with Claude is now available in beta on the Anthropic API, offering significant cost and latency reductions. This feature allows developers to reuse extensive context across multiple API requests, reducing costs by up to 90% and latency by up to 85% for long prompts. Prompt caching is currently available for Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus coming soon.
Use Cases
Prompt caching is beneficial in various scenarios:
- Conversational Agents: Reduces costs and latency for extended conversations with long instructions or uploaded documents.
- Coding Assistants: Enhances autocomplete and codebase Q&A by maintaining a summarized version of the codebase.
- Large Document Processing: Incorporates long-form material, including images, without increasing response latency.
- Detailed Instruction Sets: Allows sharing extensive lists of instructions and examples to fine-tune responses.
- Agentic Search and Tool Use: Improves performance for tasks involving multiple rounds of tool calls and iterative changes.
- Long-Form Content Interaction: Enables users to interact with books, papers, documentation, and podcast transcripts by embedding entire documents into the prompt.
Performance Improvements
Early adopters have reported substantial improvements:
- Chat with a Book: 79% reduction in latency and 90% cost reduction for a 100,000 token cached prompt.
- Many-Shot Prompting: 31% reduction in latency and 86% cost reduction for a 10,000 token prompt.
- Multi-Turn Conversation: 75% reduction in latency and 53% cost reduction for a 10-turn conversation with a long system prompt.
Pricing
Cached prompts are priced based on the number of input tokens cached and usage frequency:
- Claude 3.5 Sonnet: Cache write costs $3.75/MTok, cache read costs $0.30/MTok.
- Claude 3 Opus: Cache write costs $18.75/MTok, cache read costs $1.50/MTok (coming soon).
- Claude 3 Haiku: Cache write costs $0.30/MTok, cache read costs $0.03/MTok.
To start using prompt caching, explore the documentation and pricing page on the Anthropic API.