Gemma 4 E2B

The smallest Gemma 4 - full multimodal intelligence in 2.3 billion parameters

Gemma 4 E2B packs text, image, and audio understanding into just 2.3B effective parameters. With 128K context and as little as 4GB RAM, it brings real AI capabilities to phones, IoT devices, and the tightest hardware budgets.

Model variants

Ultra-compact instruction-tuned model

Gemma 4 E2B uses Per-Layer Embeddings (PLE) to squeeze maximum capability from minimal parameters.

Per-Layer Embeddings Architecture

2.3B effective parameters, 5.1B total with embeddings

Gemma 4 E2B uses PLE to give each of its 35 decoder layers its own conditioning pathway. With a ~150M vision encoder and ~300M audio encoder, it handles text, images, and audio natively at minimal compute cost.

The lowest-friction entry point to Gemma 4. Ideal for phones, IoT, Raspberry Pi, and any deployment where memory is the primary constraint.

Instruction-tuned

E2B Instruct

Optimized for on-device conversational AI with audio understanding

Fine-tuned for following instructions with native multimodal support

Available now

Pre-trained

E2B Base

Foundation model for fine-tuning ultra-compact edge applications

Pre-trained on diverse multimodal data for maximum flexibility at minimal size

Available now

Capabilities

Real AI capabilities at the smallest scale

Gemma 4 E2B proves that useful AI doesn't require massive hardware. Audio, vision, reasoning, and coding in a model that fits on a phone.

Native audio input

USM-style conformer audio encoder processes speech and audio clips up to 30 seconds. Voice assistants and audio analysis on the smallest devices.

Practical reasoning

60% on MMLU Pro and 37.5% on AIME 2026 math. Configurable thinking mode for step-by-step problem solving on-device.

Coding assistance

44% on LiveCodeBench v6 and 633 Codeforces ELO. Useful code generation and debugging even on constrained hardware.

128K context window

Long document processing and extended conversations on-device. Hybrid attention keeps memory usage practical.

Vision understanding

44.2% on MMMU Pro. Variable aspect ratio support for document parsing, OCR, and image analysis on-device.

Minimal footprint

As little as 3.2GB VRAM at 4-bit quantization. Runs on phones, Raspberry Pi, and budget laptops.

Key highlights

Ultra-compact performance metrics

Gemma 4 E2B delivers meaningful results across diverse tasks while fitting on the most constrained hardware.

Top achievements

  • 60% on MMLU Pro knowledge and reasoning
  • 44% on LiveCodeBench v6 coding
  • 43.4% on GPQA Diamond scientific knowledge
  • 44.2% on MMMU Pro multimodal reasoning
  • 128K token context window

Technical specs

  • 2.3B effective parameters (5.1B with embeddings)
  • 35 decoder layers with Per-Layer Embeddings
  • ~150M vision encoder + ~300M audio encoder
  • Native text, image, video, and audio input
  • 3.2-4GB VRAM at 4-bit quantization

Performance

Meaningful AI at the smallest scale

Gemma 4 E2B achieves 60% on MMLU Pro and 44% on LiveCodeBench v6 with just 2.3B effective parameters - proving that useful AI fits in your pocket.

Gemma 4 E2B demonstrates that even the smallest models in the family deliver practical value across reasoning, coding, and multimodal tasks.

Gemma 4 E2B performance comparison chart

60% on MMLU Pro - solid knowledge and reasoning for an ultra-compact model

44% on LiveCodeBench v6 - practical coding help on minimal hardware

43.4% on GPQA Diamond - science understanding in 2.3B parameters

44.2% on MMMU Pro - multimodal reasoning on-device

95 tokens/second on consumer hardware - blazing fast inference

Benchmark comparison

E2B vs E4B and the Gemma 4 family

Gemma 4 E2B is the smallest model in the family. Step up to E4B for better quality, or to 26B/31B for frontier performance.

Benchmark
Gemma 4 E2B IT
Thinking
Featured
Gemma 4 E4B IT
Thinking
Gemma 4 26B A4B IT
Thinking
Gemma 4 31B IT
Thinking
MMLU Pro
Knowledge & reasoning
60.0%69.4%82.6%85.2%
AIME 2026
Mathematics
No tools
37.5%42.5%88.3%89.2%
GPQA Diamond
Scientific knowledge
43.4%58.6%82.3%84.3%
LiveCodeBench v6
Competitive coding
44.0%52.0%77.1%80.0%
Codeforces ELO
Competitive programming
63394017182150
MMMU Pro
Multimodal reasoning
44.2%52.6%73.8%76.9%
VRAM (4-bit)
Minimum memory
~3.2 GB~5.5 GB~16 GB~17 GB
Audio Support
Native audio input
YesYesNoNo

Benchmark results from official Gemma 4 model card. E2B benchmarks demonstrate practical capability at minimal parameter count.

Ultra-Compact

Full multimodal AI in the smallest Gemma 4 package

Gemma 4 E2B is not a stripped-down model. It has the same multimodal architecture as its larger siblings - text, image, video, and audio input - just in a 2.3B effective parameter package.

  • Same modalities as E4B: text, image, video, and audio input
  • Same 128K context window as the larger edge model
  • 3.2GB VRAM at 4-bit - fits on phones and budget hardware
Full multimodal AI in the smallest Gemma 4 package

Blazing Fast

95 tokens per second on consumer hardware

The smallest model in the family is also the fastest. Gemma 4 E2B delivers near-instant responses on consumer hardware, making it ideal for real-time applications and interactive experiences.

  • ~95 tokens/second on consumer GPUs
  • Sub-second first-token latency on most hardware
  • Ideal for real-time chat, voice assistants, and interactive tools
95 tokens per second on consumer hardware

IoT & Edge

AI for devices that fit in your hand

Gemma 4 E2B is designed for the edge. Run it on Pixel phones, Raspberry Pi, Chrome browsers, and any device where privacy and latency matter more than peak benchmark scores.

  • ONNX checkpoints for cross-platform edge deployment
  • WebGPU support for in-browser inference
  • Designed for Pixel, Chrome, and IoT environments
AI for devices that fit in your hand

Part of Gemma 4

The smallest model in a frontier family

Gemma 4 E2B is the entry point to the Gemma 4 family. Step up to E4B for better quality, or to 26B/31B for frontier performance.

Gemma 4 E4B

Stronger edge model with 4.5B effective parameters

Compare

Gemma 4 26B

MoE model with near-31B quality at 4B inference cost

Learn more

Gemma 4 31B

Flagship dense model for maximum performance

Learn more

Documentation

Complete guides for integration and deployment

Read docs

Community

Join developers building with Gemma

Explore

Model Card

Technical specifications and evaluation results

View details

Get started

Ready to run AI on the smallest devices?

Start chatting for free, or download Gemma 4 E2B for ultra-compact, private, on-device deployment.