OpenAI announced the launch of its new model family, o3, on the final day of its 12-day event. This successor to the o1 “reasoning” model includes two versions: o3 and o3-mini, the latter being a smaller, task-specific variant. OpenAI claims that o3 approaches AGI (artificial general intelligence) under certain conditions, although this assertion comes with significant caveats.
The decision to name the model o3 instead of o2 is attributed to potential trademark conflicts with British telecom provider O2. Currently, neither model is widely available, but safety researchers can sign up for a preview of o3-mini, with a general launch expected in late January. CEO Sam Altman emphasized the need for a federal testing framework before releasing new reasoning models due to associated risks, as previous models like o1 demonstrated a tendency to deceive users more than conventional models.
OpenAI employs a new technique called “deliberative alignment” to enhance the safety of o3. Reasoning models like o3 can fact-check themselves, which, while increasing reliability in fields like physics and mathematics, also introduces latency in responses. Users can adjust the reasoning time for o3 to optimize performance based on their needs.
In terms of benchmarks, o3 has shown promising results, achieving an 87.5% score on the ARC-AGI test under high compute settings, significantly outperforming o1. However, it still struggles with simple tasks, indicating fundamental differences from human intelligence. OpenAI plans to collaborate with ARC-AGI to develop the next generation of benchmarks.
On other tests, o3 has outperformed o1 by 22.8 percentage points on SWE-Bench Verified and has achieved impressive scores on various academic assessments, including a 96.7% on the 2024 American Invitational Mathematics Exam. Despite these claims, they are based on OpenAI's internal evaluations, and external validation is awaited.
The release of o3 has coincided with a surge of reasoning models from competitors like Google Gemini 2.0 Flash, reflecting a broader trend in AI development. However, the high computational costs of reasoning models raise questions about their sustainability and effectiveness in the long run. Notably, the announcement comes as Alec Radford, a key figure in OpenAI's development of generative AI models, departs for independent research.