ChatGPT now features a new agentic system that enables it to autonomously perform complex tasks using its own virtual computer. This system integrates capabilities from previous tools—Operator’s web interaction, deep research’s synthesis, and ChatGPT’s conversational intelligence—allowing it to navigate websites, analyze data, run code, and produce deliverables like editable slideshows and spreadsheets. Users remain in control, with the ability to grant permissions, interrupt, or take over tasks at any time.
Unified Agentic System and Capabilities
The new ChatGPT agent combines the strengths of Operator and deep research, overcoming their individual limitations by enabling both deep analysis and interactive web navigation, including secure login for personalized content. It uses multiple tools such as a visual browser, text-based browser, terminal, and API access, choosing the optimal method to efficiently complete tasks. It supports iterative, collaborative workflows where users can guide or pause the agent, and receive notifications upon task completion.
Practical Applications
ChatGPT agent enhances productivity in professional and personal contexts by automating tasks such as creating presentations from dashboards, managing meetings, updating financial spreadsheets, planning travel, and booking appointments. It achieves state-of-the-art performance on various benchmarks including:
- Humanity’s Last Exam: Achieves a new pass@1 SOTA score of 41.6, improving to 44.4 with parallel attempts.
- FrontierMath: Reaches 27.4% accuracy on the hardest math problems, outperforming previous models.
- DSBench and SpreadsheetBench: Surpasses human performance and existing models in data science and spreadsheet editing tasks.
- Investment Banking Modeling: Outperforms previous models in complex financial modeling tasks.
- BrowseComp and WebArena: Sets new records in web browsing and real-world web task completion.
Usage and Integration
Users with Pro, Plus, and Team subscriptions can activate agent mode from the tools dropdown in any conversation. The agent can access user-connected apps (e.g., Gmail, GitHub) via ChatGPT connectors, enabling it to integrate with workflows and act on relevant data, while requiring explicit login for sensitive sites. Tasks can be scheduled to recur automatically.
Safety and Risk Mitigation
Given the agent’s ability to act on the web and access user data, OpenAI has implemented robust safeguards including:
- Explicit user confirmation before consequential actions.
- Active supervision (“Watch Mode”) for critical tasks.
- Proactive refusal of high-risk actions like bank transfers.
- Privacy controls allowing users to delete browsing data and log out of sessions.
- Secure browser takeover mode that keeps user inputs private.
- Strong defenses against prompt injection attacks, which could manipulate the agent via malicious web content.
- Continuous monitoring and rapid response to security threats.
The agent is classified under OpenAI’s High Biological and Chemical capabilities preparedness framework, with comprehensive biosafety measures and ongoing collaboration with external biosecurity experts.
Limitations and Future Directions
ChatGPT agent is in early stages and may make mistakes. Slideshow generation is currently in beta, with ongoing improvements planned for formatting and polish. Spreadsheet editing is more advanced but slide upload for editing is not yet supported. OpenAI plans iterative enhancements to improve the agent’s efficiency, depth, versatility, and user oversight balance.
Availability
The ChatGPT agent is rolling out to Pro, Plus, and Team users, with Enterprise and Education access coming soon. Usage limits vary by subscription tier, with options for additional credits. The Operator research preview will be sunset soon, with deep research integrated into the new agent.
This new agentic system represents a significant advancement in AI’s ability to autonomously complete complex, real-world tasks while maintaining user control and safety.