Claude

Anthropic pilots Claude AI agent for Chrome with new safety features

August 27, 2025 at 3:09:20 AM

TL;DR Anthropic is testing Claude for Chrome, an AI that helps with tasks in the browser like managing calendars and emails. The pilot involves 1,000 trusted users to address safety risks, especially prompt injection attacks where malicious instructions trick the AI. New defenses cut attack success rates but more work is needed. Users control permissions and confirm risky actions. Feedback will improve safety and features before wider release.

Anthropic pilots Claude AI agent for Chrome with new safety features

Anthropic has launched a pilot for Claude, an AI agent integrated directly into the Chrome browser, aiming to enhance productivity by allowing Claude to interact with web pages, click buttons, fill forms, and manage tasks like calendars and emails. This browser-based AI approach is seen as inevitable due to the volume of work done in browsers, but it introduces significant safety and security challenges that require robust safeguards.

Browser-Using AI and Safety Challenges

Browser-using AI faces risks such as prompt injection attacks, where malicious actors embed harmful instructions in websites, emails, or documents to trick the AI into performing dangerous actions like deleting files, stealing data, or making unauthorized transactions. Anthropic’s red-teaming experiments revealed a 23.6% attack success rate without safety mitigations, demonstrating the severity of these vulnerabilities.

A notable example involved a phishing email instructing Claude to delete emails without user confirmation, which Claude initially executed. However, new mitigations now allow Claude to recognize such phishing attempts and refuse to act on them.

Current Defenses and Improvements

Anthropic has implemented several layers of defense to reduce these risks:

User Permissions: Users control Claude’s access to websites and must confirm high-risk actions such as publishing or purchasing.
System Prompts: Enhanced instructions guide Claude on handling sensitive data and requests.
Site Restrictions: Claude is blocked from accessing high-risk categories like financial services, adult content, and pirated content.
Advanced Classifiers: Tools to detect suspicious instruction patterns and unusual data requests, even in legitimate contexts.

These measures have cut the attack success rate from 23.6% to 11.2% in autonomous mode, outperforming previous capabilities where Claude only viewed the screen without browser interaction.

Specialized red-teaming focused on browser-specific attacks—such as hidden malicious form fields and injections via URL text or tab titles—reduced attack success from 35.7% to 0% on targeted challenges.

Ongoing Development and Pilot Participation

Anthropic acknowledges that internal testing cannot fully replicate real-world browsing complexity or evolving attack methods. The pilot program invites 1,000 trusted Max plan users to test Claude for Chrome in authentic conditions, helping identify new vulnerabilities and improve safety classifiers and permission controls.

Participants are advised to use Claude cautiously, avoiding sensitive sites involving financial, legal, or medical information. Feedback from this pilot will guide enhancements to both Claude’s capabilities and its security measures.

Summary

Claude for Chrome represents a significant step toward integrating AI directly into web browsing, offering improved productivity by managing tasks within the browser. However, the introduction of browser-using AI necessitates rigorous safety protocols to combat prompt injection attacks and other security threats. Anthropic’s phased pilot, combined with advanced defenses and user-controlled permissions, aims to balance functionality with safety, gradually expanding access as protections improve.