Meta has introduced its largest open-source AI model to date, Llama 3.1 405B, which contains 405 billion parameters. This model is not the largest ever but is the biggest in recent years and is competitive with leading proprietary models like OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet. Trained using 16,000 Nvidia H100 GPUs, it benefits from advanced training techniques and is available for download or use on cloud platforms like AWS, Azure, and Google Cloud. It is also being used in WhatsApp and Meta.ai to power chatbots for U.S.-based users.
Key Features and Capabilities
Llama 3.1 405B can perform various tasks such as coding, answering math questions, and summarizing documents in eight languages. However, it is text-only and cannot handle image-based queries. Meta is also working on multimodal Llama models that can recognize images, videos, and generate speech, but these are not yet publicly available.
The model was trained using a dataset of 15 trillion tokens, equivalent to 750 billion words. Meta refined its data curation and quality assurance processes for this model. Synthetic data, generated by other AI models, was also used to fine-tune Llama 3.1 405B. However, Meta has not disclosed the exact sources of its training data, citing competitive and legal reasons.
Context Window and Tools
Llama 3.1 405B has a larger context window of 128,000 tokens, allowing it to summarize longer texts and maintain context in conversations better than previous models. Meta also released two smaller models, Llama 3.1 8B and Llama 3.1 70B, which share the same context window. These models can use third-party tools and APIs for tasks like answering questions about recent events, solving math problems, and validating code.
Performance and Licensing
Llama 3.1 405B performs comparably to OpenAI’s GPT-4 and shows mixed results against GPT-4o and Claude 3.5 Sonnet. It excels in executing code and generating plots but is weaker in multilingual capabilities and general reasoning. Due to its size, it requires substantial hardware to run. Meta is promoting its smaller models for general-purpose applications and sees Llama 3.1 405B as suitable for model distillation and generating synthetic data.
Meta has updated Llama’s license to allow developers to use outputs from the Llama 3.1 model family to develop third-party AI models. However, developers with apps exceeding 700 million monthly users must request a special license from Meta.
Getting Started
The models available to the community for download on llama.meta.com and Hugging Face and available for immediate development on our broad ecosystem of partner platforms.