Gemma 4 E4B
4.5 billion parameters of on-device intelligence with native audio
Gemma 4 E4B packs 4.5B effective parameters into a model that runs on laptops, phones, and browsers. With native audio, image, and text understanding plus a 128K context window, it brings frontier-class multimodal AI to the edge.
Model variants
Instruction-tuned for edge deployment
Gemma 4 E4B uses Per-Layer Embeddings (PLE) to maximize parameter efficiency, delivering strong performance from a compact architecture.
Per-Layer Embeddings Architecture
4.5B effective parameters, 8B total with embeddings
Gemma 4 E4B uses PLE to give each decoder layer its own conditioning pathway. With 42 layers and a ~150M vision encoder plus ~300M audio encoder, it processes text, images, and audio natively.
Ideal for on-device deployment, browser-based AI, and privacy-focused applications where data never leaves the user's device.
Instruction-tuned
E4B Instruct
Optimized for conversational AI, audio understanding, and on-device task completion
Fine-tuned for following instructions with native multimodal support including audio input
Pre-trained
E4B Base
Foundation model for fine-tuning edge and mobile applications
Pre-trained on diverse multimodal data including audio for maximum flexibility
Capabilities
Desktop-class intelligence on edge hardware
Gemma 4 E4B brings multimodal understanding, coding assistance, and reasoning to devices that fit in your hand.
Native audio input
USM-style conformer audio encoder processes speech and audio clips up to 30 seconds directly, no transcription pipeline needed.
Strong reasoning
Configurable thinking mode with 42.5% on AIME 2026 math and 58.6% on GPQA Diamond graduate-level science.
Capable coding
52% on LiveCodeBench v6 and 940 Codeforces ELO. Native function calling enables on-device agentic workflows.
128K context window
Process long documents, entire codebases, and extended conversations on-device with hybrid local/global attention.
Vision understanding
52.6% on MMMU Pro and 59.5% on MATH-Vision. Variable aspect ratio support with configurable image token budgets.
Run anywhere
Runs in browsers via WebGPU, on phones via ONNX, and on laptops via Ollama. As little as 5.5GB VRAM at 4-bit quantization.
Key highlights
Edge performance metrics
Gemma 4 E4B delivers strong results across diverse benchmarks while fitting on consumer hardware.
Top achievements
- 69.4% on MMLU Pro knowledge and reasoning
- 52% on LiveCodeBench v6 coding
- 58.6% on GPQA Diamond scientific knowledge
- 52.6% on MMMU Pro multimodal reasoning
- 128K token context window
Technical specs
- 4.5B effective parameters (8B with embeddings)
- 42 decoder layers with Per-Layer Embeddings
- ~150M vision encoder + ~300M audio encoder
- Native text, image, video, and audio input
- 5.5-6GB VRAM at 4-bit quantization
Performance
Punches far above its weight class
Gemma 4 E4B achieves 69.4% on MMLU Pro and 52% on LiveCodeBench v6 with only 4.5B effective parameters - outperforming many models twice its size.
Gemma 4 E4B demonstrates that edge models can deliver meaningful performance across reasoning, coding, and multimodal tasks.


69.4% on MMLU Pro - strong knowledge and reasoning for an edge model
52% on LiveCodeBench v6 - practical coding assistance on-device
58.6% on GPQA Diamond - graduate-level science understanding
52.6% on MMMU Pro - multimodal reasoning with images
940 Codeforces ELO - competitive programming capability
Benchmark comparison
E4B vs the Gemma 4 family and Gemma 3
Gemma 4 E4B delivers strong edge performance while the larger models handle heavier workloads.
| Benchmark | Gemma 4 E4B IT Thinking Featured | Gemma 4 E2B IT Thinking | Gemma 4 31B IT Thinking | Gemma 3 27B IT |
|---|---|---|---|---|
MMLU Pro Knowledge & reasoning | 69.4% | 60.0% | 85.2% | 67.6% |
AIME 2026 Mathematics No tools | 42.5% | 37.5% | 89.2% | 20.8% |
GPQA Diamond Scientific knowledge | 58.6% | 43.4% | 84.3% | 42.4% |
LiveCodeBench v6 Competitive coding | 52.0% | 44.0% | 80.0% | 29.1% |
Codeforces ELO Competitive programming | 940 | 633 | 2150 | - |
MMMU Pro Multimodal reasoning | 52.6% | 44.2% | 76.9% | 49.7% |
MATH-Vision Visual math reasoning | 59.5% | 52.4% | 85.6% | - |
Audio Support Native audio input | Yes | Yes | No | No |
Context Window Maximum tokens | 128K | 128K | 256K | 128K |
Benchmark results from official Gemma 4 model card. E4B benchmarks demonstrate exceptional efficiency for its parameter count.
Native Audio
Speech understanding without a transcription pipeline
Gemma 4 E4B includes a USM-style conformer audio encoder that processes speech and audio directly. No separate ASR model needed - just feed audio in and get intelligent responses.
- ~300M parameter conformer audio encoder built into the model
- Process audio clips up to 30 seconds directly
- Ideal for voice assistants, audio analysis, and accessibility tools
Edge Deployment
From browser to phone to Raspberry Pi
Gemma 4 E4B is designed for deployment anywhere. Run it in Chrome with WebGPU via transformers.js, on phones with ONNX, or on laptops with Ollama. As little as 5.5GB VRAM at 4-bit quantization.
- Browser: transformers.js with WebGPU acceleration in Chrome
- Mobile: ONNX checkpoints for iOS and Android deployment
- Local: Ollama, llama.cpp, MLX for private on-device inference
Vision & Documents
Image understanding and document parsing on-device
The ~150M vision encoder processes images with variable aspect ratios and configurable token budgets. Strong OCR and document understanding make it practical for on-device document analysis.
- 52.6% on MMMU Pro multimodal reasoning
- Variable image resolution: 70 to 1120 tokens per image
- Document parsing, OCR, chart comprehension on-device
Get started
Try Gemma 4 E4B now
Start chatting instantly or download for on-device deployment.
Download weights
On-device deployment
Download official model weights for edge and local deployment.
Edge platforms
Browser and mobile deployment
Deploy on edge devices, browsers, and mobile platforms.
Part of Gemma 4
The edge model in a frontier family
Gemma 4 E4B is the recommended edge model in the Gemma 4 family. Step up to 26B MoE or 31B Dense when you need more power, or down to E2B for the smallest footprint.
Get started
Ready to run AI on-device with Gemma 4 E4B?
Start chatting for free, or download the model for private, on-device deployment. No data leaves your device.