Gemma 4 26B A4B

26 billion parameters, 4 billion active - frontier intelligence at inference speed

Gemma 4 26B A4B is a Mixture-of-Experts model that activates only 4B parameters per token while delivering near-31B quality. With 256K context, 140+ languages, and 88.3% on AIME 2026, it's the most efficient path to frontier-class reasoning.

Model variants

Instruction-tuned and base models

Choose between the instruction-tuned variant optimized for chat and task completion, or the base model for fine-tuning and specialized applications.

Mixture-of-Experts Architecture

25.2B total parameters, 3.8B active per token

Gemma 4 26B A4B uses a sparse MoE design with 8 active experts out of 128 total, plus 1 shared expert. All 26B parameters load into memory for fast routing, but inference cost stays near a 4B dense model.

Ideal for high-throughput production deployments where you need near-31B quality at a fraction of the compute cost.

Instruction-tuned

26B Instruct

Optimized for conversational AI and complex task completion

Fine-tuned with RLHF for following instructions and multi-turn dialogue

Available now

Pre-trained

26B Base

Foundation MoE model for fine-tuning and specialized applications

Pre-trained on diverse multimodal data with sparse expert routing

Available now

Capabilities

Frontier-level performance at 4B inference cost

Gemma 4 26B A4B combines MoE efficiency with advanced reasoning, exceptional coding, and multimodal understanding - delivering near-31B quality at a fraction of the compute.

MoE efficiency

Activates only 3.8B parameters per token from a 25.2B pool. Near-31B quality at ~4B inference cost - the best efficiency ratio in the Gemma 4 family.

Advanced reasoning

Configurable thinking mode enables step-by-step reasoning. Achieves 88.3% on AIME 2026 mathematics - just 0.9 points behind the 31B dense model.

Exceptional coding

77.1% on LiveCodeBench v6 and 1718 Codeforces ELO. Native function calling for agentic workflows and autonomous code execution.

256K context window

Extended context for entire codebases, long documents, and multi-turn conversations. Hybrid local/global attention for memory efficiency.

Multimodal understanding

Processes text and images with variable aspect ratios. 73.8% on MMMU Pro and 82.4% on MATH-Vision for visual reasoning.

140+ languages

Multilingual support with cultural context understanding. 82.6% on MMLU Pro across diverse knowledge domains.

Key highlights

Exceptional performance metrics

Gemma 4 26B A4B achieves near-31B results across diverse benchmarks while activating only 3.8B parameters per token.

Top achievements

  • Arena AI ELO 1441 - competitive with the 31B dense model
  • 88.3% on AIME 2026 mathematics (no tools)
  • 77.1% on LiveCodeBench v6 coding
  • 82.3% on GPQA Diamond scientific knowledge
  • 85.5% on t2-bench agentic tool use

Technical specs

  • 25.2B total parameters, 3.8B active per token
  • 8 active + 1 shared expert out of 128 total
  • 256K token context window
  • Support for 140+ languages
  • Hybrid local/global attention mechanism

Performance

Near-31B quality at 4B inference cost

Gemma 4 26B A4B achieves 88.3% on AIME 2026 and 82.6% on MMLU Pro - within 1% of the 31B dense model - while activating only 3.8B parameters per token.

Gemma 4 26B A4B demonstrates consistent excellence across reasoning, coding, multimodal, and agentic benchmarks - within 1-3% of the 31B dense model on every task.

Gemma 4 26B A4B performance comparison chart

Arena AI ELO 1441 - competitive with the 31B dense model

88.3% on AIME 2026 mathematics (no tools)

77.1% on LiveCodeBench v6 competitive coding

82.3% on GPQA Diamond scientific knowledge

85.5% on t2-bench agentic tool use

Benchmark comparison

26B MoE vs 31B Dense and the Gemma 4 family

Gemma 4 26B A4B delivers near-31B performance across reasoning, coding, multimodal, and agentic tasks at a fraction of the inference cost.

Benchmark
Gemma 4 26B A4B IT
Thinking
Featured
Gemma 4 31B IT
Thinking
Gemma 4 E4B IT
Thinking
Gemma 3 27B IT
Arena AI (text)
As of April 2, 2026
14411452-1365
MMLU Pro
Knowledge & reasoning
No tools
82.6%85.2%69.4%67.6%
MMMU Pro
Multimodal reasoning
73.8%76.9%52.6%49.7%
AIME 2026
Mathematics
No tools
88.3%89.2%42.5%20.8%
LiveCodeBench v6
Competitive coding
77.1%80.0%52.0%29.1%
GPQA Diamond
Scientific knowledge
No tools
82.3%84.3%58.6%42.4%
t2-bench
Agentic tool use
Retail
85.5%86.4%57.5%6.6%

Benchmark results from official Gemma 4 model card. Arena AI scores as of April 2, 2026.

MoE Architecture

26B capacity, 4B inference cost

The Mixture-of-Experts design routes each token through 8 of 128 experts plus 1 shared expert. All 26B parameters stay in memory for instant routing, but only 3.8B activate per forward pass - delivering near-31B quality at a fraction of the compute.

  • 3.8B active parameters per token from 25.2B total capacity
  • 8 active + 1 shared expert out of 128 total experts
  • Proportional RoPE (p-RoPE) for efficient 256K context handling
Gemma 4 26B A4B MoE architecture

Advanced Reasoning

88.3% on AIME 2026 - within 1% of the 31B model

Configurable thinking mode enables transparent step-by-step reasoning for mathematics, logic, and multi-step problem solving. The 26B MoE closes the gap with the 31B dense model to under 1 percentage point on the hardest math benchmarks.

  • 88.3% on AIME 2026 mathematics (no tools)
  • 82.3% on GPQA Diamond graduate-level science
  • Built-in reasoning mode with step-by-step explanations
Gemma 4 26B A4B advanced reasoning

Coding Excellence

77.1% LiveCodeBench v6 with native function calling

With 77.1% on LiveCodeBench v6 and 1718 Codeforces ELO, Gemma 4 26B A4B excels at code generation, debugging, and agentic workflows. Native function calling enables autonomous agents without fine-tuning.

  • 77.1% on LiveCodeBench v6 competitive coding problems
  • 1718 Codeforces ELO rating
  • Native function calling for autonomous agents
Gemma 4 26B A4B coding excellence

Multimodal Understanding

Text and image processing with variable resolution

Process text and images together with support for variable aspect ratios and resolutions. 73.8% on MMMU Pro and 82.4% on MATH-Vision demonstrate strong visual reasoning and document understanding.

  • 73.8% on MMMU Pro multimodal reasoning
  • 82.4% on MATH-Vision visual math problems
  • Variable image resolution support (70-1120 tokens)
Gemma 4 26B A4B multimodal understanding

Download weights

Self-hosted deployment

Download official model weights for deployment on your infrastructure.

Deploy and scale

Production deployment options

Enterprise-ready deployment on Google Cloud, Kubernetes, or your own infrastructure.

Join the Gemmaverse

Part of the broader Gemma ecosystem

Gemma 4 26B A4B is part of Google's open model family, with extensive community support, integrations, and resources.

Documentation

Complete guides for integration and deployment

Read docs

Safety & Responsibility

Ethical AI development and safety guidelines

Learn more

Model Card

Technical specifications and evaluation results

View details

GitHub Repository

Source code, examples, and community contributions

View code

HuggingFace

Download weights and explore the model hub

Download

Cloud Deployment

Enterprise deployment on Google Cloud

Deploy

Get started

Ready to build with Gemma 4 26B A4B?

Start chatting instantly for free, or download the model for self-hosted deployment on your infrastructure.