Anthropic has launched Claude 3.5 Sonnet, the first model in the Claude 3.5 family. According to benchmarks released by Anthropic, this model outperforms competitors like GPT-4o, offering advanced intelligence at the speed and cost of its predecessor, Claude 3 Sonnet.
While benchmarks can sometimes be of limited value, often focusing on niche scenarios like answering health exam questions that may not be relevant to the average user, Claude 3.5 Sonnet has shown strong performance. It slightly outperforms leading models in several of the benchmarks tested by Anthropic.
Claude 3.5 Sonnet is accessible for free on Claude.ai and the Claude iOS app, with higher rate limits for Claude Pro and Team plan subscribers. It is also available via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Pricing is $3 per million input tokens and $15 per million output tokens, with a 200K token context window.
Performance and Capabilities
Claude 3.5 Sonnet excels in:
- Graduate-level reasoning (GPQA)
- Undergraduate-level knowledge (MMLU)
- Coding proficiency (HumanEval)
It operates at twice the speed of Claude 3 Opus, making it suitable for complex tasks like context-sensitive customer support and multi-step workflows. In internal evaluations, it solved 64% of coding problems compared to 38% by Claude 3 Opus. The model can independently write, edit, and execute code, handle code translations, and update legacy applications.
Vision Capabilities
Claude 3.5 Sonnet surpasses previous models in visual reasoning tasks, such as interpreting charts and graphs, and accurately transcribing text from imperfect images. This is particularly useful in retail, logistics, and financial services.
Artifacts on Claude.ai allow users to generate and interact with content like code snippets, text documents, or website designs in a dedicated window. This feature transforms Claude from a conversational AI to a collaborative work environment, with future plans to support team collaboration.
Claude 3.5 Sonnet has undergone rigorous testing to reduce misuse and remains at ASL-2. External experts, including the UK Artificial Intelligence Safety Institute (UK AISI), have evaluated its safety. The model does not train on user-submitted data without explicit permission, ensuring privacy.
Anthropic aims to improve the tradeoff between intelligence, speed, and cost. Upcoming releases include Claude 3.5 Haiku and Claude 3.5 Opus. New features like Memory will enable Claude to remember user preferences, enhancing personalization and efficiency.