Claude

Anthropic Launches Prompt Caching for Claude, Reduces Costs by Up to 90%

August 14, 2024 at 6:55:48 PM - Trending 🔥

TL;DR Anthropic's new prompt caching feature for Claude models allows users to fine-tune responses with extensive prompts, reducing costs by up to 90% and latency by up to 85%. Available in beta, it supports conversational assistants, document processing, and more. Pricing varies by model, with cached prompts being significantly cheaper. Notion is already using this feature to enhance its AI assistant, Notion AI.

Anthropic Launches Prompt Caching for Claude, Reduces Costs by Up to 90%

Prompt caching with Claude is now available in beta on the Anthropic API, offering significant cost and latency reductions. This feature allows developers to reuse extensive context across multiple API requests, reducing costs by up to 90% and latency by up to 85% for long prompts. Prompt caching is currently available for Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus coming soon.

Use Cases

Prompt caching is beneficial in various scenarios:

Conversational Agents: Reduces costs and latency for extended conversations with long instructions or uploaded documents.
Coding Assistants: Enhances autocomplete and codebase Q&A by maintaining a summarized version of the codebase.
Large Document Processing: Incorporates long-form material, including images, without increasing response latency.
Detailed Instruction Sets: Allows sharing extensive lists of instructions and examples to fine-tune responses.
Agentic Search and Tool Use: Improves performance for tasks involving multiple rounds of tool calls and iterative changes.
Long-Form Content Interaction: Enables users to interact with books, papers, documentation, and podcast transcripts by embedding entire documents into the prompt.

Performance Improvements

Early adopters have reported substantial improvements:

Chat with a Book: 79% reduction in latency and 90% cost reduction for a 100,000 token cached prompt.
Many-Shot Prompting: 31% reduction in latency and 86% cost reduction for a 10,000 token prompt.
Multi-Turn Conversation: 75% reduction in latency and 53% cost reduction for a 10-turn conversation with a long system prompt.

Pricing

Cached prompts are priced based on the number of input tokens cached and usage frequency:

Claude 3.5 Sonnet: Cache write costs $3.75/MTok, cache read costs $0.30/MTok.
Claude 3 Opus: Cache write costs $18.75/MTok, cache read costs $1.50/MTok (coming soon).
Claude 3 Haiku: Cache write costs $0.30/MTok, cache read costs $0.03/MTok.

To start using prompt caching, explore the documentation and pricing page on the Anthropic API.