GPT‑5.4 is OpenAI's latest frontier model released across ChatGPT (as GPT‑5.4 Thinking), the API, and Codex, designed for professional work with enhanced capability and efficiency. GPT‑5.4 Pro is also available for users needing maximum performance on complex tasks. This model integrates recent advances in reasoning, coding, and agentic workflows, combining the coding strengths of GPT‑5.3‑Codex with improved handling of tools, software environments, and professional tasks like spreadsheets, presentations, and documents. It delivers accurate, effective, and efficient results with less back-and-forth interaction.
Key Features and Improvements
ChatGPT Enhancements: GPT‑5.4 Thinking provides an upfront plan of its reasoning, allowing users to adjust the course mid-response for more aligned final outputs without extra turns. It improves deep web research for highly specific queries and maintains better context for longer, complex questions, resulting in faster, higher-quality, and more relevant answers.
Codex and API Capabilities: GPT‑5.4 is the first general-purpose model with native, state-of-the-art computer-use abilities, enabling agents to operate computers and manage complex workflows across applications. It supports up to 1 million tokens of context, allowing long-horizon task planning, execution, and verification. The model improves tool ecosystem interactions with tool search functionality, enhancing efficiency without losing intelligence. It is also the most token-efficient reasoning model to date, using fewer tokens than GPT‑5.2, which reduces token usage and speeds up processing.
Performance Benchmarks
| Benchmark | GPT-5.4 | GPT-5.3-Codex | GPT-5.2 |
|---|---|---|---|
| GDPval (wins/ties) | 83.0% | 70.9% | 70.9% |
| SWE-Bench Pro | 57.7% | 56.8% | 55.6% |
| OSWorld-Verified | 75.0% | 74.0%* | 47.3% |
| Toolathlon | 54.6% | 51.9% | 46.3% |
| BrowseComp | 82.7% | 77.3% | 65.8% |
*GPT‑5.3‑Codex’s OSWorld-Verified score improved due to a new API parameter preserving image resolution.
Knowledge Work
GPT‑5.4 builds on GPT‑5.2’s reasoning with more consistent and polished results across real-world professional tasks. On GDPval, which tests knowledge work across 44 occupations from top U.S. industries, GPT‑5.4 matches or exceeds industry professionals in 83.0% of comparisons versus 70.9% for GPT‑5.2. It excels in creating and editing spreadsheets, presentations, and documents:
- Spreadsheet modeling tasks scored 87.3% vs. 68.4% for GPT‑5.2.
- Human raters preferred GPT‑5.4-generated presentations 68% of the time for aesthetics, visual variety, and image generation.
- GPT‑5.4 reduces hallucinations and factual errors: individual claims are 33% less likely to be false, and full responses are 18% less likely to contain errors compared to GPT‑5.2.
Computer Use and Vision
GPT‑5.4 is the first general-purpose model with native computer-use capabilities, excelling at operating computers through code (e.g., Playwright) and UI interactions via mouse and keyboard commands from screenshots. It is steerable via developer messages and configurable for safety levels.
- Achieves 75.0% success on OSWorld-Verified (desktop navigation), surpassing GPT‑5.2’s 47.3% and human performance at 72.4%.
- Leads browser use benchmarks with 67.3% success on WebArena-Verified and 92.8% on Online-Mind2Web.
- Improved visual perception: 81.2% success on MMMU-Pro (visual understanding) and better document parsing with a lower error rate on OmniDocBench.
- Supports high-fidelity image inputs up to 10.24 million pixels, enhancing localization, understanding, and click accuracy.
Coding
GPT‑5.4 combines GPT‑5.3‑Codex’s coding strengths with enhanced knowledge work and computer-use capabilities, excelling in longer, tool-assisted tasks with less manual intervention. It matches or outperforms GPT‑5.3‑Codex on SWE-Bench Pro with lower latency.
- /fast mode delivers up to 1.5x faster token velocity without sacrificing intelligence.
- Excels at complex frontend tasks with more aesthetic and functional results.
- Introduces “Playwright (Interactive),” an experimental Codex skill for visually debugging and testing web and Electron apps during development.
Demonstration Example
A theme park simulation game was created using GPT‑5.4 with Playwright Interactive and image generation. The game features tile-based path placement, ride and scenery construction, guest pathfinding, queueing, and ride cycles. Metrics like money, guest count, happiness, cleanliness, and rating dynamically respond to park layout and guest behavior. Playwright automated browser playtests verified smooth navigation, guest reactions, and UI stability over multiple rounds of play.
GPT‑5.4 represents a significant advance in professional AI capabilities, combining improved reasoning, coding, computer use, and visual perception to deliver faster, more accurate, and contextually aware outputs across a wide range of real-world tasks.


