xAI Launches Grok-2 and Grok-2 Mini with Image Generation on X

August 14, 2024 at 7:22:08 AM

TL;DR xAI has launched Grok-2 and Grok-2 mini, advanced language models with superior reasoning, chat, and coding capabilities. Grok-2 outperforms Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard. Both models are available in beta on the 𝕏 platform and will be accessible via an enterprise API. Grok-2 excels in academic benchmarks and vision-based tasks. Premium users on 𝕏 can access these models, which offer real-time information integration.

xAI Launches Grok-2 and Grok-2 Mini with Image Generation on X

xAI has launched Grok-2 and Grok-2 Mini, two advanced language models with state-of-the-art reasoning capabilities, available in beta on the 𝕏 platform. Grok-2, a significant upgrade from Grok-1.5, excels in chat, coding, and reasoning tasks, outperforming models like Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard. Grok-2 Mini, a smaller yet capable variant, is also introduced.

Grok-2 Language Model and Chat Capabilities

Grok-2, tested under the name "sus-column-r" in the LMSYS chatbot arena, outperforms both Claude and GPT-4 in overall Elo scores. xAI's internal evaluations focus on instruction-following and factual accuracy, showing Grok-2's significant improvements in reasoning, content retrieval, and tool use.

Benchmarks

Grok-2 and Grok-2 Mini have been evaluated across various academic benchmarks, demonstrating significant improvements over Grok-1.5. Key areas include:

  • Graduate-level science knowledge (GPQA)
  • General knowledge (MMLU, MMLU-Pro)
  • Math competition problems (MATH)
  • Visual math reasoning (MathVista)
  • Document-based question answering (DocVQA)
Benchmark Grok-1.5 Grok-2 Mini Grok-2 GPT-4 Turbo Claude 3 Opus Gemini Pro 1.5 Llama 3 405B GPT-4o Claude 3.5 Sonnet
GPQA 35.9% 51.0% 56.0% 48.0% 50.4% 46.2% 51.1% 53.6% 59.6%
MMLU 81.3% 86.2% 87.5% 86.5% 85.7% 85.9% 88.6% 88.7% 88.3%
MMLU-Pro 51.0% 72.0% 75.5% 63.7% 68.5% 69.0% 73.3% 72.6% 76.1%
MATH 50.6% 73.0% 76.1% 72.6% 60.1% 67.7% 73.8% 76.6% 71.1%
HumanEval 74.1% 85.7% 88.4% 87.1% 84.9% 71.9% 89.0% 90.2% 92.0%
MMMU 53.6% 63.2% 66.1% 63.1% 59.4% 62.2% 64.5% 69.1% 68.3%
MathVista 52.8% 68.1% 69.0% 58.1% 50.5% 63.9% β€” 63.8% 67.7%
DocVQA 85.6% 93.2% 93.6% 87.2% 89.3% 93.1% 92.2% 92.8% 95.2%

Experience Grok with Real-Time Information on 𝕏

Grok-2 and Grok-2 Mini are available to 𝕏 Premium and Premium+ users, featuring advanced text and vision understanding, real-time information integration, and a redesigned interface. Grok-2 offers enhanced capabilities for various tasks, while Grok-2 Mini balances speed and answer quality. Collaboration with Black Forest Labs aims to expand Grok’s capabilities further.

Build with Grok Using the Enterprise API

Later this month, Grok-2 and Grok-2 Mini will be available through a new enterprise API platform, offering multi-region inference deployments, enhanced security features, rich traffic statistics, and advanced billing analytics. The management API will facilitate integration with existing in-house tools and services.

What is Next?

Grok-2 and Grok-2 Mini are being rolled out on 𝕏, with future applications including enhanced search capabilities, deeper insights on 𝕏 posts, and improved reply functions. A preview of multimodal understanding will also be released soon. xAI continues to advance AI development with a focus on core reasoning capabilities, driven by a small, highly talented team.

Have more questions on this topic? Ask our AI assistant for in-depth insights.

Read more from sources πŸ‘‡

The Only Digital Marketing Feed You'll Ever Need.

Stay informed your way. Tailored updates when and how you want them. 100% Free.

10,000+ Users

500+ Sources

1000+ Tools

Or

Related Posts

Related Tools

Marketing Auditor logo

Marketing Auditor

Verified Tool

Verified Tool

Marketing Auditor is a Verified Tool. Want to get this badge? Contact us.

Verified Tool

Automated audits for Google Ads and Analytics.

Get Featured Here

Showcase your tool in this list.

Contact Us