Anthropic's Console now includes features that allow users to generate, test, and evaluate prompts using Claude. These enhancements aim to streamline the process of crafting high-quality prompts for AI-powered applications.
Generate Prompts
Claude can now generate input variables for your prompts. Users can describe a task (e.g., "Triage inbound customer support requests") and have Claude generate a high-quality prompt. This feature is powered by Claude 3.5 Sonnet.
Evaluate Prompts
The new Evaluate tab allows users to create test cases to evaluate prompts against real-world inputs. Users can modify these test cases as needed and run all of them in one click. This feature helps in building confidence in prompt quality before deploying to production.
Users can now compare the outputs of two or more prompts side by side. This feature allows subject matter experts to grade responses on a 5-point scale, facilitating prompt iteration and improvement.
Test Suite Generation
Users can manually add or import test cases from a CSV or auto-generate them using Claude. The Evaluate feature in the Console allows for direct testing of prompts against a range of real-world inputs, eliminating the need for manual management across spreadsheets or code.
Refining prompts is now more efficient, with the ability to create new versions and re-run test suites. The side-by-side comparison of outputs and expert grading on a 5-point scale enable faster and more accessible model performance improvement.
Availability
Test case generation and output comparison features are available to all users on the Anthropic Console.