Download Gemma 4
Download official Gemma 4 weights - Apache 2.0 licensed
All four Gemma 4 models are available for download from Hugging Face, Kaggle, and Ollama. Instruction-tuned and base variants, multiple quantization levels, and full commercial freedom under Apache 2.0.
Available models
All Gemma 4 variants ready for download
Each model comes in instruction-tuned (IT) and base variants. Choose based on your use case: IT for chat and tasks, base for fine-tuning.
Download Options
Multiple formats and quantization levels
Models are available in BF16 (full precision), GGUF (for llama.cpp/Ollama), and ONNX (for edge deployment). Quantized versions from 4-bit to 8-bit reduce memory requirements.
For most users, start with Ollama (auto-downloads the right quantization) or Hugging Face GGUF files for llama.cpp.
Edge - 2.3B effective
Gemma 4 E2B
Smallest model. 3.2GB at 4-bit. Runs on phones, IoT, and budget hardware.
Includes audio encoder. Best for ultra-compact deployment where memory is the primary constraint.
Edge - 4.5B effective
Gemma 4 E4B
Recommended edge model. 5.5GB at 4-bit. Best quality for laptops and desktops.
Includes audio encoder. Strong reasoning and coding for on-device use.
Server - MoE
Gemma 4 26B A4B
Efficient MoE model. 16GB at 4-bit. Near-31B quality at 4B inference cost.
128 experts, 8 active + 1 shared. Best for high-throughput production serving.
Server - Flagship
Gemma 4 31B
Maximum quality. 17GB at 4-bit. #3 on Arena AI leaderboard.
Dense architecture for maximum reliability. Best for quality-critical applications.
Download sources
Official download platforms
Download from trusted, official sources. All models are verified and maintained by Google DeepMind.
Hugging Face
Full model repositories with all variants, quantizations, and documentation. The most comprehensive source for Gemma 4 weights.
Kaggle
Official Google model hosting. Download weights and access notebooks for experimentation and fine-tuning.
Ollama
One-command download and run. Ollama automatically selects the right quantization for your hardware.
Google AI Studio
No download needed. Use Gemma 4 through a hosted API for prototyping and development.
GGUF format
Optimized for llama.cpp and Ollama. Multiple quantization levels from Q4_K_M to Q8_0 for different memory budgets.
ONNX format
Cross-platform deployment for edge devices, mobile, and browser. Optimized for inference on diverse hardware.
Quick download
Fastest way to get started
Use Ollama for the fastest path from download to running. One command does everything.
Ollama commands
- ollama pull gemma4:e2b - Edge ultra-compact
- ollama pull gemma4:e4b - Edge recommended
- ollama pull gemma4:26b - Server MoE
- ollama pull gemma4:31b - Server flagship
- ollama run gemma4:e4b - Download and start chatting
Hugging Face CLI
- pip install huggingface_hub
- huggingface-cli download google/gemma-4-e4b-it
- huggingface-cli download google/gemma-4-26b-a4b-it
- huggingface-cli download google/gemma-4-31b-it
- Add --revision for specific quantizations
Download sizes
File sizes by model and quantization
Choose the right quantization for your storage and memory constraints. Smaller quantizations trade some quality for significantly reduced file size.
Download sizes vary by quantization level. 4-bit quantization (Q4_K_M) offers the best balance of quality and size for most users.


E2B at 4-bit: ~2GB download, ~3.2GB in memory
E4B at 4-bit: ~4GB download, ~5.5GB in memory
26B at 4-bit: ~10GB download, ~16GB in memory
31B at 4-bit: ~12GB download, ~17GB in memory
Size comparison
Download and memory requirements
File sizes for different quantization levels across all Gemma 4 models.
| Benchmark | E2B E2B | E4B E4B | 26B MoE 26B | 31B Dense 31B |
|---|---|---|---|---|
4-bit GGUF Recommended | ~2 GB | ~4 GB | ~10 GB | ~12 GB |
8-bit GGUF Higher quality | ~5 GB | ~8 GB | ~24 GB | ~29 GB |
BF16 Full precision | ~10 GB | ~16 GB | ~48 GB | ~58 GB |
VRAM needed At 4-bit | ~3.2 GB | ~5.5 GB | ~16 GB | ~17 GB |
Approximate sizes. Actual download may vary slightly by source and format.
Apache 2.0
Full commercial freedom with Apache 2.0 license
Every Gemma 4 model is released under the Apache 2.0 license. No MAU caps, no acceptable-use restrictions, no royalties. Use commercially, modify freely, distribute without limitations.
- Full commercial use permitted without restrictions
- Modify and distribute freely
- No usage caps or reporting requirements
Multiple Formats
GGUF, ONNX, SafeTensors, and more
Gemma 4 is available in multiple formats for different deployment targets. GGUF for llama.cpp/Ollama, ONNX for edge devices, SafeTensors for transformers, and more.
- GGUF: llama.cpp, Ollama, LM Studio, GPT4All
- ONNX: Edge devices, mobile, browser deployment
- SafeTensors: Hugging Face transformers, vLLM, TGI
Verified Sources
Download from official, verified sources only
All Gemma 4 weights are published by Google DeepMind on official platforms. Always verify the publisher before downloading to ensure you get authentic, unmodified weights.
- Hugging Face: google/ organization verified
- Kaggle: google/ publisher verified
- Ollama: Official library entry
Official sources
Download from verified platforms
Get authentic Gemma 4 weights from official sources.
Setup guides
Get running after download
Step-by-step guides for each deployment tool.
Documentation
Technical references
Detailed documentation for all models.
After download
What to do with your Gemma 4 weights
Downloaded the weights? Here's what you can do next.
Get started
Download Gemma 4 and start building
Try it online first, or download directly for private, local deployment. Apache 2.0 licensed for full commercial freedom.