Download Gemma 4

Download official Gemma 4 weights - Apache 2.0 licensed

All four Gemma 4 models are available for download from Hugging Face, Kaggle, and Ollama. Instruction-tuned and base variants, multiple quantization levels, and full commercial freedom under Apache 2.0.

Try Before Download See all variants

Available models

All Gemma 4 variants ready for download

Each model comes in instruction-tuned (IT) and base variants. Choose based on your use case: IT for chat and tasks, base for fine-tuning.

Download Options

Multiple formats and quantization levels

Models are available in BF16 (full precision), GGUF (for llama.cpp/Ollama), and ONNX (for edge deployment). Quantized versions from 4-bit to 8-bit reduce memory requirements.

For most users, start with Ollama (auto-downloads the right quantization) or Hugging Face GGUF files for llama.cpp.

Try Before Download Hardware guide

Edge - 2.3B effective

Gemma 4 E2B

Smallest model. 3.2GB at 4-bit. Runs on phones, IoT, and budget hardware.

Includes audio encoder. Best for ultra-compact deployment where memory is the primary constraint.

~2-10 GB download

Hugging Face Ollama

Edge - 4.5B effective

Gemma 4 E4B

Recommended edge model. 5.5GB at 4-bit. Best quality for laptops and desktops.

Includes audio encoder. Strong reasoning and coding for on-device use.

~4-16 GB download

Hugging Face Ollama

Server - MoE

Gemma 4 26B A4B

Efficient MoE model. 16GB at 4-bit. Near-31B quality at 4B inference cost.

128 experts, 8 active + 1 shared. Best for high-throughput production serving.

~10-48 GB download

Hugging Face Ollama

Server - Flagship

Gemma 4 31B

Maximum quality. 17GB at 4-bit. #3 on Arena AI leaderboard.

Dense architecture for maximum reliability. Best for quality-critical applications.

~12-58 GB download

Hugging Face Ollama

Download sources

Official download platforms

Download from trusted, official sources. All models are verified and maintained by Google DeepMind.

Hugging Face

Full model repositories with all variants, quantizations, and documentation. The most comprehensive source for Gemma 4 weights.

Kaggle

Official Google model hosting. Download weights and access notebooks for experimentation and fine-tuning.

Ollama

One-command download and run. Ollama automatically selects the right quantization for your hardware.

Google AI Studio

No download needed. Use Gemma 4 through a hosted API for prototyping and development.

GGUF format

Optimized for llama.cpp and Ollama. Multiple quantization levels from Q4_K_M to Q8_0 for different memory budgets.

ONNX format

Cross-platform deployment for edge devices, mobile, and browser. Optimized for inference on diverse hardware.

Quick download

Fastest way to get started

Use Ollama for the fastest path from download to running. One command does everything.

Ollama commands

ollama pull gemma4:e2b - Edge ultra-compact
ollama pull gemma4:e4b - Edge recommended
ollama pull gemma4:26b - Server MoE
ollama pull gemma4:31b - Server flagship
ollama run gemma4:e4b - Download and start chatting

Hugging Face CLI

pip install huggingface_hub
huggingface-cli download google/gemma-4-e4b-it
huggingface-cli download google/gemma-4-26b-a4b-it
huggingface-cli download google/gemma-4-31b-it
Add --revision for specific quantizations

Try Before Download View all models

Download sizes

File sizes by model and quantization

Choose the right quantization for your storage and memory constraints. Smaller quantizations trade some quality for significantly reduced file size.

Download sizes vary by quantization level. 4-bit quantization (Q4_K_M) offers the best balance of quality and size for most users.

Download now Hardware guide

Gemma 4 download size comparison across models and quantizations

E2B at 4-bit: ~2GB download, ~3.2GB in memory

E4B at 4-bit: ~4GB download, ~5.5GB in memory

26B at 4-bit: ~10GB download, ~16GB in memory

31B at 4-bit: ~12GB download, ~17GB in memory

Size comparison

Download and memory requirements

File sizes for different quantization levels across all Gemma 4 models.

Benchmark	E2B E2B	E4B E4B	26B MoE 26B	31B Dense 31B
4-bit GGUF Recommended	~2 GB	~4 GB	~10 GB	~12 GB
8-bit GGUF Higher quality	~5 GB	~8 GB	~24 GB	~29 GB
BF16 Full precision	~10 GB	~16 GB	~48 GB	~58 GB
VRAM needed At 4-bit	~3.2 GB	~5.5 GB	~16 GB	~17 GB

Approximate sizes. Actual download may vary slightly by source and format.

Apache 2.0

Full commercial freedom with Apache 2.0 license

Every Gemma 4 model is released under the Apache 2.0 license. No MAU caps, no acceptable-use restrictions, no royalties. Use commercially, modify freely, distribute without limitations.

Full commercial use permitted without restrictions
Modify and distribute freely
No usage caps or reporting requirements

Download now View license

Full commercial freedom with Apache 2.0 license

Multiple Formats

GGUF, ONNX, SafeTensors, and more

Gemma 4 is available in multiple formats for different deployment targets. GGUF for llama.cpp/Ollama, ONNX for edge devices, SafeTensors for transformers, and more.

GGUF: llama.cpp, Ollama, LM Studio, GPT4All
ONNX: Edge devices, mobile, browser deployment
SafeTensors: Hugging Face transformers, vLLM, TGI

Hugging Face Format guide

Verified Sources

Download from official, verified sources only

All Gemma 4 weights are published by Google DeepMind on official platforms. Always verify the publisher before downloading to ensure you get authentic, unmodified weights.

Hugging Face: google/ organization verified
Kaggle: google/ publisher verified
Ollama: Official library entry

Hugging Face Kaggle