xAI Launches Grok-2 and Grok-2 Mini with Image Generation on X

August 14, 2024 at 7:22:08 AM

TL;DR xAI has launched Grok-2 and Grok-2 mini, advanced language models with superior reasoning, chat, and coding capabilities. Grok-2 outperforms Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard. Both models are available in beta on the 𝕏 platform and will be accessible via an enterprise API. Grok-2 excels in academic benchmarks and vision-based tasks. Premium users on 𝕏 can access these models, which offer real-time information integration.

xAI Launches Grok-2 and Grok-2 Mini with Image Generation on X

xAI has launched Grok-2 and Grok-2 Mini, two advanced language models with state-of-the-art reasoning capabilities, available in beta on the 𝕏 platform. Grok-2, a significant upgrade from Grok-1.5, excels in chat, coding, and reasoning tasks, outperforming models like Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard. Grok-2 Mini, a smaller yet capable variant, is also introduced.

Grok-2 Language Model and Chat Capabilities

Grok-2, tested under the name "sus-column-r" in the LMSYS chatbot arena, outperforms both Claude and GPT-4 in overall Elo scores. xAI's internal evaluations focus on instruction-following and factual accuracy, showing Grok-2's significant improvements in reasoning, content retrieval, and tool use.

Benchmarks

Grok-2 and Grok-2 Mini have been evaluated across various academic benchmarks, demonstrating significant improvements over Grok-1.5. Key areas include:

Graduate-level science knowledge (GPQA)
General knowledge (MMLU, MMLU-Pro)
Math competition problems (MATH)
Visual math reasoning (MathVista)
Document-based question answering (DocVQA)

Benchmark	Grok-1.5	Grok-2 Mini	Grok-2	GPT-4 Turbo	Claude 3 Opus	Gemini Pro 1.5	Llama 3 405B	GPT-4o	Claude 3.5 Sonnet
GPQA	35.9%	51.0%	56.0%	48.0%	50.4%	46.2%	51.1%	53.6%	59.6%
MMLU	81.3%	86.2%	87.5%	86.5%	85.7%	85.9%	88.6%	88.7%	88.3%
MMLU-Pro	51.0%	72.0%	75.5%	63.7%	68.5%	69.0%	73.3%	72.6%	76.1%
MATH	50.6%	73.0%	76.1%	72.6%	60.1%	67.7%	73.8%	76.6%	71.1%
HumanEval	74.1%	85.7%	88.4%	87.1%	84.9%	71.9%	89.0%	90.2%	92.0%
MMMU	53.6%	63.2%	66.1%	63.1%	59.4%	62.2%	64.5%	69.1%	68.3%
MathVista	52.8%	68.1%	69.0%	58.1%	50.5%	63.9%	—	63.8%	67.7%
DocVQA	85.6%	93.2%	93.6%	87.2%	89.3%	93.1%	92.2%	92.8%	95.2%

Experience Grok with Real-Time Information on 𝕏

Grok-2 and Grok-2 Mini are available to 𝕏 Premium and Premium+ users, featuring advanced text and vision understanding, real-time information integration, and a redesigned interface. Grok-2 offers enhanced capabilities for various tasks, while Grok-2 Mini balances speed and answer quality. Collaboration with Black Forest Labs aims to expand Grok’s capabilities further.

Build with Grok Using the Enterprise API

Later this month, Grok-2 and Grok-2 Mini will be available through a new enterprise API platform, offering multi-region inference deployments, enhanced security features, rich traffic statistics, and advanced billing analytics. The management API will facilitate integration with existing in-house tools and services.

What is Next?

Grok-2 and Grok-2 Mini are being rolled out on 𝕏, with future applications including enhanced search capabilities, deeper insights on 𝕏 posts, and improved reply functions. A preview of multimodal understanding will also be released soon. xAI continues to advance AI development with a focus on core reasoning capabilities, driven by a small, highly talented team.