Fast to launch & very fast output speed! Groq has launched their Gemma 2 9B offering and is serving it at ~600 output tokens/s.
Google Gemma 2 9B is worthy alternative to Llama 3 8B and other smaller models. It is particularly attractive for generalist and communication-focused use-cases as shown by its Chatbot Arena (1185) & MMLU (71%) score exceeding Llama 3 8B (1153, 68%).
For more specific use-cases it is worth conducting more narrow tests, e.g. for coding Gemma 2 9B well underperforms Llama 3 8B (40% vs. 62% on HumanEval).
Groq is offering the model at $0.2 per 1M Input & Output tokens, in-line with Fireworks.
Congratulations GroqInc on the fast-launch and impressive performance. We look forward to benchmarking other providers as they begin to host the Gemma 2 models, including potentially Google itself.
Read related articles: