Download Gemma 4

Download official Gemma 4 weights - Apache 2.0 licensed

All four Gemma 4 models are available for download from Hugging Face, Kaggle, and Ollama. Instruction-tuned and base variants, multiple quantization levels, and full commercial freedom under Apache 2.0.

Available models

All Gemma 4 variants ready for download

Each model comes in instruction-tuned (IT) and base variants. Choose based on your use case: IT for chat and tasks, base for fine-tuning.

Download Options

Multiple formats and quantization levels

Models are available in BF16 (full precision), GGUF (for llama.cpp/Ollama), and ONNX (for edge deployment). Quantized versions from 4-bit to 8-bit reduce memory requirements.

For most users, start with Ollama (auto-downloads the right quantization) or Hugging Face GGUF files for llama.cpp.

Edge - 2.3B effective

Gemma 4 E2B

Smallest model. 3.2GB at 4-bit. Runs on phones, IoT, and budget hardware.

Includes audio encoder. Best for ultra-compact deployment where memory is the primary constraint.

~2-10 GB download

Edge - 4.5B effective

Gemma 4 E4B

Recommended edge model. 5.5GB at 4-bit. Best quality for laptops and desktops.

Includes audio encoder. Strong reasoning and coding for on-device use.

~4-16 GB download

Server - MoE

Gemma 4 26B A4B

Efficient MoE model. 16GB at 4-bit. Near-31B quality at 4B inference cost.

128 experts, 8 active + 1 shared. Best for high-throughput production serving.

~10-48 GB download

Server - Flagship

Gemma 4 31B

Maximum quality. 17GB at 4-bit. #3 on Arena AI leaderboard.

Dense architecture for maximum reliability. Best for quality-critical applications.

~12-58 GB download

Download sources

Official download platforms

Download from trusted, official sources. All models are verified and maintained by Google DeepMind.

Hugging Face

Full model repositories with all variants, quantizations, and documentation. The most comprehensive source for Gemma 4 weights.

Kaggle

Official Google model hosting. Download weights and access notebooks for experimentation and fine-tuning.

Ollama

One-command download and run. Ollama automatically selects the right quantization for your hardware.

Google AI Studio

No download needed. Use Gemma 4 through a hosted API for prototyping and development.

GGUF format

Optimized for llama.cpp and Ollama. Multiple quantization levels from Q4_K_M to Q8_0 for different memory budgets.

ONNX format

Cross-platform deployment for edge devices, mobile, and browser. Optimized for inference on diverse hardware.

Quick download

Fastest way to get started

Use Ollama for the fastest path from download to running. One command does everything.

Ollama commands

  • ollama pull gemma4:e2b - Edge ultra-compact
  • ollama pull gemma4:e4b - Edge recommended
  • ollama pull gemma4:26b - Server MoE
  • ollama pull gemma4:31b - Server flagship
  • ollama run gemma4:e4b - Download and start chatting

Hugging Face CLI

  • pip install huggingface_hub
  • huggingface-cli download google/gemma-4-e4b-it
  • huggingface-cli download google/gemma-4-26b-a4b-it
  • huggingface-cli download google/gemma-4-31b-it
  • Add --revision for specific quantizations

Download sizes

File sizes by model and quantization

Choose the right quantization for your storage and memory constraints. Smaller quantizations trade some quality for significantly reduced file size.

Download sizes vary by quantization level. 4-bit quantization (Q4_K_M) offers the best balance of quality and size for most users.

Gemma 4 download size comparison across models and quantizations

E2B at 4-bit: ~2GB download, ~3.2GB in memory

E4B at 4-bit: ~4GB download, ~5.5GB in memory

26B at 4-bit: ~10GB download, ~16GB in memory

31B at 4-bit: ~12GB download, ~17GB in memory

Size comparison

Download and memory requirements

File sizes for different quantization levels across all Gemma 4 models.

Benchmark
E2B
E2B
E4B
E4B
26B MoE
26B
31B Dense
31B
4-bit GGUF
Recommended
~2 GB~4 GB~10 GB~12 GB
8-bit GGUF
Higher quality
~5 GB~8 GB~24 GB~29 GB
BF16
Full precision
~10 GB~16 GB~48 GB~58 GB
VRAM needed
At 4-bit
~3.2 GB~5.5 GB~16 GB~17 GB

Approximate sizes. Actual download may vary slightly by source and format.

Apache 2.0

Full commercial freedom with Apache 2.0 license

Every Gemma 4 model is released under the Apache 2.0 license. No MAU caps, no acceptable-use restrictions, no royalties. Use commercially, modify freely, distribute without limitations.

  • Full commercial use permitted without restrictions
  • Modify and distribute freely
  • No usage caps or reporting requirements
Full commercial freedom with Apache 2.0 license

Multiple Formats

GGUF, ONNX, SafeTensors, and more

Gemma 4 is available in multiple formats for different deployment targets. GGUF for llama.cpp/Ollama, ONNX for edge devices, SafeTensors for transformers, and more.

  • GGUF: llama.cpp, Ollama, LM Studio, GPT4All
  • ONNX: Edge devices, mobile, browser deployment
  • SafeTensors: Hugging Face transformers, vLLM, TGI
GGUF, ONNX, SafeTensors, and more

Verified Sources

Download from official, verified sources only

All Gemma 4 weights are published by Google DeepMind on official platforms. Always verify the publisher before downloading to ensure you get authentic, unmodified weights.

  • Hugging Face: google/ organization verified
  • Kaggle: google/ publisher verified
  • Ollama: Official library entry
Download from official, verified sources only

After download

What to do with your Gemma 4 weights

Downloaded the weights? Here's what you can do next.

Run Locally

Complete guide to local deployment

Read guide

API Access

Use via hosted API instead

Get started

Fine-tuning

Customize for your specific tasks

Learn more

All Models

Compare all Gemma 4 variants

Compare

Community

Join developers building with Gemma

Explore

Model Card

Technical specifications

View

Get started

Download Gemma 4 and start building

Try it online first, or download directly for private, local deployment. Apache 2.0 licensed for full commercial freedom.