Starting today, open source is leading the way. Meta introducing Llama 3.1 405b their most capable models yet.
Today Meta releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context window and improved support for 8 languages among other improvements. Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation.
The models are available to download now directly from Meta or HuggingFace.
Model Training
Training a model as large and capable as Llama 3.1 405B was no simple task. The model was trained on over 15 trillion tokens over the course of several months requiring over 16K
@NVIDIA H100 GPUs — making it the first Llama model ever trained at this scale. Meta also used the 405B parameter model to improve the post-training quality of our smaller models.
With Llama 3.1, dev team evaluated performance on >150 benchmark datasets spanning a wide range of languages — in addition to extensive human evaluations in real-world scenarios. These results show that the 405B competes with leading closed source models like GPT-4, Claude 2 and Gemini Ultra across a range of tasks.
The upgraded Llama 3.1 8B & 70B models are also best-in-class, outperforming other models at their size while also delivering a better balance of helpfulness and safety than their predecessors. These smaller models support the same improved 128K token context window, multilinguality, improved reasoning and state-of-the-art tool use to enable more advanced use cases.
As Mark Zuckerberg shared in an open letter this morning:
We believe that open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn’t concentrated in the hands of a small few, and that the technology can be deployed more evenly and safely across society.
Meta also updated their license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time.
Read related articles: