Gemma 4 Models

Four models, one family - from edge to frontier

The Gemma 4 family spans four architectures: ultra-compact E2B and E4B for edge devices, the 26B MoE for efficient server deployment, and the 31B Dense flagship. All share native multimodal support, configurable thinking, and Apache 2.0 licensing.

Start Chatting Compare benchmarks

All models

Choose the right Gemma 4 for your use case

Each model in the family is optimized for different deployment scenarios. Edge models include audio support, while server models offer 256K context and frontier-class reasoning.

Edge Models

E2B & E4B: On-device intelligence with audio

Ultra-compact models with 2.3B and 4.5B effective parameters. Both include native audio encoders, 128K context, and run on phones, browsers, and IoT devices.

Choose E2B for the smallest footprint (3.2GB at 4-bit). Choose E4B for better quality (5.5GB at 4-bit). Both support text, image, video, and audio input.

Try E4B Free Compare E2B vs E4B

Server Models

26B MoE & 31B Dense: Frontier performance

The 26B MoE activates only 4B parameters per token for efficient serving. The 31B Dense is the flagship with #3 Arena AI ranking. Both feature 256K context and native function calling.

Choose 26B for high-throughput production (16GB at 4-bit). Choose 31B for maximum quality (17GB at 4-bit). Both excel at reasoning, coding, and multimodal tasks.

Try 26B Free See benchmarks

Edge - Ultra-compact

Gemma 4 E2B

2.3B effective parameters. The smallest Gemma 4 with full multimodal + audio support.

35 layers, PLE architecture, ~150M vision + ~300M audio encoder. 3.2GB VRAM at 4-bit.

Available now

Learn more Download

Edge - Recommended

Gemma 4 E4B

4.5B effective parameters. Best edge model with strong reasoning and audio support.

42 layers, PLE architecture, ~150M vision + ~300M audio encoder. 5.5GB VRAM at 4-bit.

Available now

Learn more Download

Server - Efficient

Gemma 4 26B A4B

25.2B total, 3.8B active per token. Near-31B quality at a fraction of the compute.

MoE with 128 experts (8 active + 1 shared). 256K context. 16GB VRAM at 4-bit.

Available now

Learn more Download

Server - Flagship

Gemma 4 31B

30.7B dense parameters. #3 on Arena AI. Maximum intelligence and reliability.

Dense architecture, 256K context, 140+ languages. 17GB VRAM at 4-bit.

Available now

Learn more Download

Shared capabilities

What every Gemma 4 model can do

All four models share a common set of capabilities that make the Gemma 4 family uniquely versatile.

Native multimodal

All models process text and images natively. Edge models add audio and video support. No separate encoders or pipelines needed.

Configurable thinking

All models support thinking modes for step-by-step reasoning. Control the depth of reasoning based on task complexity.

Function calling

Built-in function calling across the family enables agentic workflows. No fine-tuning required for tool use.

Extended context

128K tokens for edge models, 256K for server models. Hybrid attention keeps memory usage practical.

140+ languages

Multilingual support with cultural context understanding across all model sizes.

Apache 2.0 license

Full commercial freedom. No MAU caps, no acceptable-use restrictions. Deploy anywhere, modify freely.

Quick selection guide

Which model should you choose?

Match your deployment constraints and quality requirements to the right Gemma 4 variant.

By hardware

Phone / IoT / 4GB RAM: Gemma 4 E2B
Laptop / 8-16GB RAM: Gemma 4 E4B
Single GPU / 16-24GB VRAM: Gemma 4 26B A4B
Multi-GPU / 24GB+ VRAM: Gemma 4 31B

By use case

Voice assistant / audio: E2B or E4B (audio support)
Browser-based AI: E2B or E4B (WebGPU)
High-throughput API: 26B A4B (MoE efficiency)
Maximum quality: 31B Dense (frontier performance)

Start Chatting View all benchmarks

Performance

Complete benchmark comparison across all four models

Every Gemma 4 model forms part of a Pareto frontier - each size delivers exceptional performance relative to its parameter count.

From the ultra-compact E2B to the flagship 31B, each model is optimized for its deployment tier while sharing the same architectural innovations.

Start Chatting View model card

Gemma 4 family performance comparison across all model sizes

31B Dense: #3 on Arena AI (ELO 1452), 89.2% AIME 2026, 80% LiveCodeBench v6

26B MoE: Near-31B quality (ELO 1441) with only 4B active parameters per token

E4B: 69.4% MMLU Pro, 52% LiveCodeBench - strong edge performance with audio

E2B: 60% MMLU Pro, 44% LiveCodeBench - meaningful AI at 3.2GB VRAM

Full family comparison

All Gemma 4 models side by side

Complete benchmark results across reasoning, coding, multimodal, and deployment metrics.

Benchmark	31B Dense Flagship 31B	26B A4B MoE 26B	E4B Edge E4B	E2B Compact E2B
Arena AI ELO Overall ranking	1452	1441	-	-
MMLU Pro Knowledge & reasoning	85.2%	82.6%	69.4%	60.0%
AIME 2026 Mathematics	89.2%	88.3%	42.5%	37.5%
LiveCodeBench v6 Coding	80.0%	77.1%	52.0%	44.0%
GPQA Diamond Science	84.3%	82.3%	58.6%	43.4%
MMMU Pro Multimodal	76.9%	73.8%	52.6%	44.2%
Context Window Max tokens	256K	256K	128K	128K
Audio Support Native audio	No	No	Yes	Yes
VRAM (4-bit) Minimum memory	~17 GB	~16 GB	~5.5 GB	~3.2 GB

All figures from official Gemma 4 model card. Arena AI scores as of April 2, 2026.

Edge Tier

E2B & E4B: AI that runs on your device

The edge models bring full multimodal AI to phones, browsers, and IoT devices. Both include native audio encoders - a capability the larger models don't have. Choose E2B for the smallest footprint, E4B for better quality.

E2B: 2.3B effective, 3.2GB at 4-bit, 95 tok/s on consumer hardware
E4B: 4.5B effective, 5.5GB at 4-bit, strong reasoning and coding
Both: native audio, 128K context, WebGPU browser support

Try E4B Compare E2B vs E4B

Server Tier

26B MoE & 31B Dense: Frontier performance

The server models deliver frontier-class reasoning, coding, and multimodal understanding. The 26B MoE offers near-31B quality at a fraction of the compute. The 31B Dense is the flagship for maximum performance.

26B MoE: 3.8B active per token, ELO 1441, 88.3% AIME 2026
31B Dense: Full 30.7B active, ELO 1452, 89.2% AIME 2026
Both: 256K context, native function calling, 140+ languages

Try 26B Compare 26B vs 31B

26B MoE & 31B Dense: Frontier performance

Architecture

Shared innovations across the family

All Gemma 4 models share key architectural innovations from Google DeepMind's research. Per-Layer Embeddings, shared KV cache, and hybrid attention patterns maximize efficiency at every scale.