Gemma 4 E2B

The smallest Gemma 4 - full multimodal intelligence in 2.3 billion parameters

Gemma 4 E2B packs text, image, and audio understanding into just 2.3B effective parameters. With 128K context and as little as 4GB RAM, it brings real AI capabilities to phones, IoT devices, and the tightest hardware budgets.

Start Chatting View benchmarks

Model variants

Ultra-compact instruction-tuned model

Gemma 4 E2B uses Per-Layer Embeddings (PLE) to squeeze maximum capability from minimal parameters.

Per-Layer Embeddings Architecture

2.3B effective parameters, 5.1B total with embeddings

Gemma 4 E2B uses PLE to give each of its 35 decoder layers its own conditioning pathway. With a ~150M vision encoder and ~300M audio encoder, it handles text, images, and audio natively at minimal compute cost.

The lowest-friction entry point to Gemma 4. Ideal for phones, IoT, Raspberry Pi, and any deployment where memory is the primary constraint.

Start Chatting See capabilities

Instruction-tuned

E2B Instruct

Optimized for on-device conversational AI with audio understanding

Fine-tuned for following instructions with native multimodal support

Available now

Start Chatting Download weights

Pre-trained

E2B Base

Foundation model for fine-tuning ultra-compact edge applications

Pre-trained on diverse multimodal data for maximum flexibility at minimal size

Available now

View on HuggingFace Fine-tuning guide

Capabilities

Real AI capabilities at the smallest scale

Gemma 4 E2B proves that useful AI doesn't require massive hardware. Audio, vision, reasoning, and coding in a model that fits on a phone.

Native audio input

USM-style conformer audio encoder processes speech and audio clips up to 30 seconds. Voice assistants and audio analysis on the smallest devices.

Practical reasoning

60% on MMLU Pro and 37.5% on AIME 2026 math. Configurable thinking mode for step-by-step problem solving on-device.

Coding assistance

44% on LiveCodeBench v6 and 633 Codeforces ELO. Useful code generation and debugging even on constrained hardware.

128K context window

Long document processing and extended conversations on-device. Hybrid attention keeps memory usage practical.

Vision understanding

44.2% on MMMU Pro. Variable aspect ratio support for document parsing, OCR, and image analysis on-device.

Minimal footprint

As little as 3.2GB VRAM at 4-bit quantization. Runs on phones, Raspberry Pi, and budget laptops.

Key highlights

Ultra-compact performance metrics

Gemma 4 E2B delivers meaningful results across diverse tasks while fitting on the most constrained hardware.

Top achievements

60% on MMLU Pro knowledge and reasoning
44% on LiveCodeBench v6 coding
43.4% on GPQA Diamond scientific knowledge
44.2% on MMMU Pro multimodal reasoning
128K token context window

Technical specs

2.3B effective parameters (5.1B with embeddings)
35 decoder layers with Per-Layer Embeddings
~150M vision encoder + ~300M audio encoder
Native text, image, video, and audio input
3.2-4GB VRAM at 4-bit quantization

Start Chatting View model card

Performance

Meaningful AI at the smallest scale

Gemma 4 E2B achieves 60% on MMLU Pro and 44% on LiveCodeBench v6 with just 2.3B effective parameters - proving that useful AI fits in your pocket.

Gemma 4 E2B demonstrates that even the smallest models in the family deliver practical value across reasoning, coding, and multimodal tasks.

Start Chatting View model card

Gemma 4 E2B performance comparison chart

60% on MMLU Pro - solid knowledge and reasoning for an ultra-compact model

44% on LiveCodeBench v6 - practical coding help on minimal hardware

43.4% on GPQA Diamond - science understanding in 2.3B parameters

44.2% on MMMU Pro - multimodal reasoning on-device

95 tokens/second on consumer hardware - blazing fast inference

Benchmark comparison

E2B vs E4B and the Gemma 4 family

Gemma 4 E2B is the smallest model in the family. Step up to E4B for better quality, or to 26B/31B for frontier performance.

Benchmark	Gemma 4 E2B IT Thinking Featured	Gemma 4 E4B IT Thinking	Gemma 4 26B A4B IT Thinking	Gemma 4 31B IT Thinking
MMLU Pro Knowledge & reasoning	60.0%	69.4%	82.6%	85.2%
AIME 2026 Mathematics No tools	37.5%	42.5%	88.3%	89.2%
GPQA Diamond Scientific knowledge	43.4%	58.6%	82.3%	84.3%
LiveCodeBench v6 Competitive coding	44.0%	52.0%	77.1%	80.0%
Codeforces ELO Competitive programming	633	940	1718	2150
MMMU Pro Multimodal reasoning	44.2%	52.6%	73.8%	76.9%
VRAM (4-bit) Minimum memory	~3.2 GB	~5.5 GB	~16 GB	~17 GB
Audio Support Native audio input	Yes	Yes	No	No

Benchmark results from official Gemma 4 model card. E2B benchmarks demonstrate practical capability at minimal parameter count.

Ultra-Compact

Full multimodal AI in the smallest Gemma 4 package

Gemma 4 E2B is not a stripped-down model. It has the same multimodal architecture as its larger siblings - text, image, video, and audio input - just in a 2.3B effective parameter package.

Same modalities as E4B: text, image, video, and audio input
Same 128K context window as the larger edge model
3.2GB VRAM at 4-bit - fits on phones and budget hardware

Start Chatting Compare with E4B

Full multimodal AI in the smallest Gemma 4 package

Blazing Fast

95 tokens per second on consumer hardware

The smallest model in the family is also the fastest. Gemma 4 E2B delivers near-instant responses on consumer hardware, making it ideal for real-time applications and interactive experiences.

~95 tokens/second on consumer GPUs
Sub-second first-token latency on most hardware
Ideal for real-time chat, voice assistants, and interactive tools

Try the speed Hardware guide

95 tokens per second on consumer hardware

IoT & Edge

AI for devices that fit in your hand

Gemma 4 E2B is designed for the edge. Run it on Pixel phones, Raspberry Pi, Chrome browsers, and any device where privacy and latency matter more than peak benchmark scores.