Gemma 4 26B A4B

26 billion parameters, 4 billion active - frontier intelligence at inference speed

Gemma 4 26B A4B is a Mixture-of-Experts model that activates only 4B parameters per token while delivering near-31B quality. With 256K context, 140+ languages, and 88.3% on AIME 2026, it's the most efficient path to frontier-class reasoning.

Start Chatting View benchmarks

Model variants

Instruction-tuned and base models

Choose between the instruction-tuned variant optimized for chat and task completion, or the base model for fine-tuning and specialized applications.

Mixture-of-Experts Architecture

25.2B total parameters, 3.8B active per token

Gemma 4 26B A4B uses a sparse MoE design with 8 active experts out of 128 total, plus 1 shared expert. All 26B parameters load into memory for fast routing, but inference cost stays near a 4B dense model.

Ideal for high-throughput production deployments where you need near-31B quality at a fraction of the compute cost.

Start Chatting See capabilities

Instruction-tuned

26B Instruct

Optimized for conversational AI and complex task completion

Fine-tuned with RLHF for following instructions and multi-turn dialogue

Available now

Start Chatting Download weights

Pre-trained

26B Base

Foundation MoE model for fine-tuning and specialized applications

Pre-trained on diverse multimodal data with sparse expert routing

Available now

View on HuggingFace Fine-tuning guide

Capabilities

Frontier-level performance at 4B inference cost

Gemma 4 26B A4B combines MoE efficiency with advanced reasoning, exceptional coding, and multimodal understanding - delivering near-31B quality at a fraction of the compute.

MoE efficiency

Activates only 3.8B parameters per token from a 25.2B pool. Near-31B quality at ~4B inference cost - the best efficiency ratio in the Gemma 4 family.

Advanced reasoning

Configurable thinking mode enables step-by-step reasoning. Achieves 88.3% on AIME 2026 mathematics - just 0.9 points behind the 31B dense model.

Exceptional coding

77.1% on LiveCodeBench v6 and 1718 Codeforces ELO. Native function calling for agentic workflows and autonomous code execution.

256K context window

Extended context for entire codebases, long documents, and multi-turn conversations. Hybrid local/global attention for memory efficiency.

Multimodal understanding

Processes text and images with variable aspect ratios. 73.8% on MMMU Pro and 82.4% on MATH-Vision for visual reasoning.

140+ languages

Multilingual support with cultural context understanding. 82.6% on MMLU Pro across diverse knowledge domains.

Key highlights

Exceptional performance metrics

Gemma 4 26B A4B achieves near-31B results across diverse benchmarks while activating only 3.8B parameters per token.

Top achievements

Arena AI ELO 1441 - competitive with the 31B dense model
88.3% on AIME 2026 mathematics (no tools)
77.1% on LiveCodeBench v6 coding
82.3% on GPQA Diamond scientific knowledge
85.5% on t2-bench agentic tool use

Technical specs

25.2B total parameters, 3.8B active per token
8 active + 1 shared expert out of 128 total
256K token context window
Support for 140+ languages
Hybrid local/global attention mechanism

Start Free Chat Download weights

Performance

Near-31B quality at 4B inference cost

Gemma 4 26B A4B achieves 88.3% on AIME 2026 and 82.6% on MMLU Pro - within 1% of the 31B dense model - while activating only 3.8B parameters per token.

Gemma 4 26B A4B demonstrates consistent excellence across reasoning, coding, multimodal, and agentic benchmarks - within 1-3% of the 31B dense model on every task.

Start Chatting View model card

Gemma 4 26B A4B performance comparison chart

Arena AI ELO 1441 - competitive with the 31B dense model

88.3% on AIME 2026 mathematics (no tools)

77.1% on LiveCodeBench v6 competitive coding

82.3% on GPQA Diamond scientific knowledge

85.5% on t2-bench agentic tool use

Benchmark comparison

26B MoE vs 31B Dense and the Gemma 4 family

Gemma 4 26B A4B delivers near-31B performance across reasoning, coding, multimodal, and agentic tasks at a fraction of the inference cost.

Benchmark	Gemma 4 26B A4B IT Thinking Featured	Gemma 4 31B IT Thinking	Gemma 4 E4B IT Thinking	Gemma 3 27B IT
Arena AI (text) As of April 2, 2026	1441	1452	-	1365
MMLU Pro Knowledge & reasoning No tools	82.6%	85.2%	69.4%	67.6%
MMMU Pro Multimodal reasoning	73.8%	76.9%	52.6%	49.7%
AIME 2026 Mathematics No tools	88.3%	89.2%	42.5%	20.8%
LiveCodeBench v6 Competitive coding	77.1%	80.0%	52.0%	29.1%
GPQA Diamond Scientific knowledge No tools	82.3%	84.3%	58.6%	42.4%
t2-bench Agentic tool use Retail	85.5%	86.4%	57.5%	6.6%

Benchmark results from official Gemma 4 model card. Arena AI scores as of April 2, 2026.

MoE Architecture

26B capacity, 4B inference cost

The Mixture-of-Experts design routes each token through 8 of 128 experts plus 1 shared expert. All 26B parameters stay in memory for instant routing, but only 3.8B activate per forward pass - delivering near-31B quality at a fraction of the compute.

3.8B active parameters per token from 25.2B total capacity
8 active + 1 shared expert out of 128 total experts
Proportional RoPE (p-RoPE) for efficient 256K context handling

Start Chatting View benchmarks

Advanced Reasoning

88.3% on AIME 2026 - within 1% of the 31B model

Configurable thinking mode enables transparent step-by-step reasoning for mathematics, logic, and multi-step problem solving. The 26B MoE closes the gap with the 31B dense model to under 1 percentage point on the hardest math benchmarks.

88.3% on AIME 2026 mathematics (no tools)
82.3% on GPQA Diamond graduate-level science
Built-in reasoning mode with step-by-step explanations

Try reasoning tasks View benchmarks

Coding Excellence

77.1% LiveCodeBench v6 with native function calling

With 77.1% on LiveCodeBench v6 and 1718 Codeforces ELO, Gemma 4 26B A4B excels at code generation, debugging, and agentic workflows. Native function calling enables autonomous agents without fine-tuning.

77.1% on LiveCodeBench v6 competitive coding problems
1718 Codeforces ELO rating
Native function calling for autonomous agents

Test coding tasks Integration docs

Multimodal Understanding

Text and image processing with variable resolution

Process text and images together with support for variable aspect ratios and resolutions. 73.8% on MMMU Pro and 82.4% on MATH-Vision demonstrate strong visual reasoning and document understanding.