Google Gemma 2

Google Gemma 2

Google Gemma 2 is now available to researchers and developers. This model offers top-tier performance, operates at remarkable speeds across various hardware, and integrates seamlessly with other AI tools.

AI has the potential to address some of humanity’s most pressing problems, but this requires accessible tools for everyone. Earlier this year, we introduced Gemma. It’s a family of lightweight, state-of-the-art open models built from the same research and technology behind the Gemini models. Since then, we’ve expanded the Gemma family with CodeGemma, RecurrentGemma, and PaliGemma. Each providing unique capabilities for different AI tasks and easily accessible through integrations with partners like Hugging Face, NVIDIA, and Ollama.

Today, we are excited to officially release Gemma 2 to researchers and developers worldwide. Available in 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 surpasses its predecessor in performance and efficiency, with significant safety advancements. The 27B model competes with models more than twice its size, offering performance previously achievable only with proprietary models. This is now possible on a single NVIDIA H100 Tensor Core GPU or TPU host, drastically reducing deployment costs.

A New Standard for Efficiency and Performance

Gemma 2 is built on a redesigned architecture, optimized for exceptional performance and inference efficiency. Here’s what sets it apart:

  • Outsized Performance: The 27B model offers the best performance in its size class and competes with models more than twice its size. The 9B model also leads its category, outperforming Llama 3 8B and other models. Detailed performance breakdowns are available in the technical report.
  • Unmatched Efficiency and Cost Savings: The 27B model runs efficiently at full precision on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU, reducing costs while maintaining high performance.
  • Blazing Fast Inference Across Hardware: Gemma 2 is optimized for incredible speed across various hardware, from gaming laptops and high-end desktops to cloud-based setups. You can try Gemma 2 at full precision in Google AI Studio, or use the quantized version with Gemma.cpp on your CPU, or on your home computer with an NVIDIA RTX or GeForce RTX via Hugging Face Transformers.

Designed for Developers and Researchers

Gemma 2 is not only more powerful but also designed for easy integration into your workflows:

  • Open and Accessible: Like the original Gemma models, Gemma 2 is available under our commercially-friendly Gemma license, allowing developers and researchers to share and commercialize their innovations.
  • Broad Framework Compatibility: Gemma 2 works with major AI frameworks like Hugging Face Transformers, JAX, PyTorch, and TensorFlow via native Keras 3.0, vLLM, Gemma.cpp, Llama.cpp, and Ollama. It is optimized with NVIDIA TensorRT-LLM for NVIDIA-accelerated infrastructure and will soon support optimization for NVIDIA’s NeMo.
  • Effortless Deployment: Starting next month, Google Cloud customers can easily deploy and manage Gemma 2 on Vertex AI.

Explore the new Gemma Cookbook, a collection of practical examples and recipes to help you build applications and fine-tune Gemma 2 models for specific tasks. Learn how to use Gemma with your preferred tools, including for common tasks like retrieval-augmented generation.

Responsible AI Development

We’re committed to supporting developers and researchers in building and deploying AI responsibly. Our Responsible Generative AI Toolkit includes resources such as the open-sourced LLM Comparator, which helps evaluate language models. You can use the companion Python library for comparative evaluations and visualize the results. We are also working on open sourcing our text watermarking technology, SynthID, for Gemma models.

During the training of Gemma 2, we followed rigorous safety processes, filtering pre-training data and conducting thorough testing to mitigate potential biases and risks. We publish our results on public benchmarks related to safety and representational harms.

Unlock New Possibilities with Gemma 2

Gemma 2 empowers developers to launch more ambitious projects, unlocking new levels of performance and potential in AI creations. We continue to explore new architectures and develop specialized Gemma variants to tackle a broader range of AI tasks. This includes the upcoming 2.6B parameter Gemma 2 model, bridging the gap between lightweight accessibility and powerful performance.

Getting Started

Gemma 2 is now available in Google AI Studio, where you can test its full performance capabilities at 27B without hardware requirements. You can also download Gemma 2’s model weights from Kaggle and Hugging Face Models, with Vertex AI Model Garden coming soon.

To support research and development, Gemma 2 is available free of charge through Kaggle or a free tier for Colab notebooks. First-time Google Cloud customers may be eligible for $300 in credits. Academic researchers can apply for the Gemma 2 Academic Research Program to receive Google Cloud credits to accelerate their research with Gemma 2. Applications are open now through August 9.

By Google Dev Team.

Read related articles:


Posted

in

by

Tags: