Gemma 4 E2B
The smallest Gemma 4 - full multimodal intelligence in 2.3 billion parameters
Gemma 4 E2B packs text, image, and audio understanding into just 2.3B effective parameters. With 128K context and as little as 4GB RAM, it brings real AI capabilities to phones, IoT devices, and the tightest hardware budgets.
Model variants
Ultra-compact instruction-tuned model
Gemma 4 E2B uses Per-Layer Embeddings (PLE) to squeeze maximum capability from minimal parameters.
Per-Layer Embeddings Architecture
2.3B effective parameters, 5.1B total with embeddings
Gemma 4 E2B uses PLE to give each of its 35 decoder layers its own conditioning pathway. With a ~150M vision encoder and ~300M audio encoder, it handles text, images, and audio natively at minimal compute cost.
The lowest-friction entry point to Gemma 4. Ideal for phones, IoT, Raspberry Pi, and any deployment where memory is the primary constraint.
Instruction-tuned
E2B Instruct
Optimized for on-device conversational AI with audio understanding
Fine-tuned for following instructions with native multimodal support
Pre-trained
E2B Base
Foundation model for fine-tuning ultra-compact edge applications
Pre-trained on diverse multimodal data for maximum flexibility at minimal size
Capabilities
Real AI capabilities at the smallest scale
Gemma 4 E2B proves that useful AI doesn't require massive hardware. Audio, vision, reasoning, and coding in a model that fits on a phone.
Native audio input
USM-style conformer audio encoder processes speech and audio clips up to 30 seconds. Voice assistants and audio analysis on the smallest devices.
Practical reasoning
60% on MMLU Pro and 37.5% on AIME 2026 math. Configurable thinking mode for step-by-step problem solving on-device.
Coding assistance
44% on LiveCodeBench v6 and 633 Codeforces ELO. Useful code generation and debugging even on constrained hardware.
128K context window
Long document processing and extended conversations on-device. Hybrid attention keeps memory usage practical.
Vision understanding
44.2% on MMMU Pro. Variable aspect ratio support for document parsing, OCR, and image analysis on-device.
Minimal footprint
As little as 3.2GB VRAM at 4-bit quantization. Runs on phones, Raspberry Pi, and budget laptops.
Key highlights
Ultra-compact performance metrics
Gemma 4 E2B delivers meaningful results across diverse tasks while fitting on the most constrained hardware.
Top achievements
- 60% on MMLU Pro knowledge and reasoning
- 44% on LiveCodeBench v6 coding
- 43.4% on GPQA Diamond scientific knowledge
- 44.2% on MMMU Pro multimodal reasoning
- 128K token context window
Technical specs
- 2.3B effective parameters (5.1B with embeddings)
- 35 decoder layers with Per-Layer Embeddings
- ~150M vision encoder + ~300M audio encoder
- Native text, image, video, and audio input
- 3.2-4GB VRAM at 4-bit quantization
Performance
Meaningful AI at the smallest scale
Gemma 4 E2B achieves 60% on MMLU Pro and 44% on LiveCodeBench v6 with just 2.3B effective parameters - proving that useful AI fits in your pocket.
Gemma 4 E2B demonstrates that even the smallest models in the family deliver practical value across reasoning, coding, and multimodal tasks.


60% on MMLU Pro - solid knowledge and reasoning for an ultra-compact model
44% on LiveCodeBench v6 - practical coding help on minimal hardware
43.4% on GPQA Diamond - science understanding in 2.3B parameters
44.2% on MMMU Pro - multimodal reasoning on-device
95 tokens/second on consumer hardware - blazing fast inference
Benchmark comparison
E2B vs E4B and the Gemma 4 family
Gemma 4 E2B is the smallest model in the family. Step up to E4B for better quality, or to 26B/31B for frontier performance.
| Benchmark | Gemma 4 E2B IT Thinking Featured | Gemma 4 E4B IT Thinking | Gemma 4 26B A4B IT Thinking | Gemma 4 31B IT Thinking |
|---|---|---|---|---|
MMLU Pro Knowledge & reasoning | 60.0% | 69.4% | 82.6% | 85.2% |
AIME 2026 Mathematics No tools | 37.5% | 42.5% | 88.3% | 89.2% |
GPQA Diamond Scientific knowledge | 43.4% | 58.6% | 82.3% | 84.3% |
LiveCodeBench v6 Competitive coding | 44.0% | 52.0% | 77.1% | 80.0% |
Codeforces ELO Competitive programming | 633 | 940 | 1718 | 2150 |
MMMU Pro Multimodal reasoning | 44.2% | 52.6% | 73.8% | 76.9% |
VRAM (4-bit) Minimum memory | ~3.2 GB | ~5.5 GB | ~16 GB | ~17 GB |
Audio Support Native audio input | Yes | Yes | No | No |
Benchmark results from official Gemma 4 model card. E2B benchmarks demonstrate practical capability at minimal parameter count.
Ultra-Compact
Full multimodal AI in the smallest Gemma 4 package
Gemma 4 E2B is not a stripped-down model. It has the same multimodal architecture as its larger siblings - text, image, video, and audio input - just in a 2.3B effective parameter package.
- Same modalities as E4B: text, image, video, and audio input
- Same 128K context window as the larger edge model
- 3.2GB VRAM at 4-bit - fits on phones and budget hardware
Blazing Fast
95 tokens per second on consumer hardware
The smallest model in the family is also the fastest. Gemma 4 E2B delivers near-instant responses on consumer hardware, making it ideal for real-time applications and interactive experiences.
- ~95 tokens/second on consumer GPUs
- Sub-second first-token latency on most hardware
- Ideal for real-time chat, voice assistants, and interactive tools
IoT & Edge
AI for devices that fit in your hand
Gemma 4 E2B is designed for the edge. Run it on Pixel phones, Raspberry Pi, Chrome browsers, and any device where privacy and latency matter more than peak benchmark scores.
- ONNX checkpoints for cross-platform edge deployment
- WebGPU support for in-browser inference
- Designed for Pixel, Chrome, and IoT environments
Get started
Try Gemma 4 E2B now
Start chatting instantly or download for ultra-compact deployment.
Download weights
Ultra-compact deployment
Download official model weights for the smallest possible deployment.
Edge platforms
Phone, browser, and IoT deployment
Deploy on the smallest devices with optimized runtimes.
Part of Gemma 4
The smallest model in a frontier family
Gemma 4 E2B is the entry point to the Gemma 4 family. Step up to E4B for better quality, or to 26B/31B for frontier performance.
Get started
Ready to run AI on the smallest devices?
Start chatting for free, or download Gemma 4 E2B for ultra-compact, private, on-device deployment.