What a massive week for Open Source AI. We finally managed to beat closed source fair and square!
- Meta Llama 3.1 405B, 70B & 8B—The latest in the llama series, this version (base + instruct) comes with multilingual (8 languages) support, a 128K context, and an even more commercially permissive license. The best part: 405B beats GPT4o / GPT4o mini fair and square!
- Bonus: Meta posted a banger of a tech report with quite a lot of details also on upcoming (?) multi-modal (image/ audio/ video).
- Mistral dropped Large 123B—Dense, multilingual (12 languages), and 128K context. Comes as instruct-only model checkpoint, with performance less than 405B but higher than L3.1 70B. Released under non-commercial license.
- Nvidia released Minitron distilled 4B & 8B – apache 2.0 license, 256K vocab, with student beating the teacher by 16% on MMLU. Uses iterative pruning and distilling to achieve SoTA! The real question: Who is distilling 405B right now? 😉
- InternLM shared Step Prover 7B—SoTA on the Lean, which was trained on Github repos with large-scale formal data. Achieves 48.8 pass@1, 54.5 pass@64. They release the dataset, tech report and the fine-tuned InternLM math plus model checkpoint.
- CofeAI dropped Chonky TeleFM 1T – A one trillion parameter dense model trained on 2T tokens, bilingual – Chinese and English, apache 2.0 licensed and tech report. They use a novel progressive upsampling approach.
Stability dropped Sv4D, Nvidia released MambaVision, SakanaLabs with Evo (merging + stable diffusion), and more.
This was a landmark week, and I’m personally quite happy with the direction of open source AI/ ML!
Author: Vaibhav (VB) Srivastav
Read related articles: