Gemma 4 E4B

4.5 billion parameters of on-device intelligence with native audio

Gemma 4 E4B packs 4.5B effective parameters into a model that runs on laptops, phones, and browsers. With native audio, image, and text understanding plus a 128K context window, it brings frontier-class multimodal AI to the edge.

Model variants

Instruction-tuned for edge deployment

Gemma 4 E4B uses Per-Layer Embeddings (PLE) to maximize parameter efficiency, delivering strong performance from a compact architecture.

Per-Layer Embeddings Architecture

4.5B effective parameters, 8B total with embeddings

Gemma 4 E4B uses PLE to give each decoder layer its own conditioning pathway. With 42 layers and a ~150M vision encoder plus ~300M audio encoder, it processes text, images, and audio natively.

Ideal for on-device deployment, browser-based AI, and privacy-focused applications where data never leaves the user's device.

Instruction-tuned

E4B Instruct

Optimized for conversational AI, audio understanding, and on-device task completion

Fine-tuned for following instructions with native multimodal support including audio input

Available now

Pre-trained

E4B Base

Foundation model for fine-tuning edge and mobile applications

Pre-trained on diverse multimodal data including audio for maximum flexibility

Available now

Capabilities

Desktop-class intelligence on edge hardware

Gemma 4 E4B brings multimodal understanding, coding assistance, and reasoning to devices that fit in your hand.

Native audio input

USM-style conformer audio encoder processes speech and audio clips up to 30 seconds directly, no transcription pipeline needed.

Strong reasoning

Configurable thinking mode with 42.5% on AIME 2026 math and 58.6% on GPQA Diamond graduate-level science.

Capable coding

52% on LiveCodeBench v6 and 940 Codeforces ELO. Native function calling enables on-device agentic workflows.

128K context window

Process long documents, entire codebases, and extended conversations on-device with hybrid local/global attention.

Vision understanding

52.6% on MMMU Pro and 59.5% on MATH-Vision. Variable aspect ratio support with configurable image token budgets.

Run anywhere

Runs in browsers via WebGPU, on phones via ONNX, and on laptops via Ollama. As little as 5.5GB VRAM at 4-bit quantization.

Key highlights

Edge performance metrics

Gemma 4 E4B delivers strong results across diverse benchmarks while fitting on consumer hardware.

Top achievements

  • 69.4% on MMLU Pro knowledge and reasoning
  • 52% on LiveCodeBench v6 coding
  • 58.6% on GPQA Diamond scientific knowledge
  • 52.6% on MMMU Pro multimodal reasoning
  • 128K token context window

Technical specs

  • 4.5B effective parameters (8B with embeddings)
  • 42 decoder layers with Per-Layer Embeddings
  • ~150M vision encoder + ~300M audio encoder
  • Native text, image, video, and audio input
  • 5.5-6GB VRAM at 4-bit quantization

Performance

Punches far above its weight class

Gemma 4 E4B achieves 69.4% on MMLU Pro and 52% on LiveCodeBench v6 with only 4.5B effective parameters - outperforming many models twice its size.

Gemma 4 E4B demonstrates that edge models can deliver meaningful performance across reasoning, coding, and multimodal tasks.

Gemma 4 E4B performance comparison chart

69.4% on MMLU Pro - strong knowledge and reasoning for an edge model

52% on LiveCodeBench v6 - practical coding assistance on-device

58.6% on GPQA Diamond - graduate-level science understanding

52.6% on MMMU Pro - multimodal reasoning with images

940 Codeforces ELO - competitive programming capability

Benchmark comparison

E4B vs the Gemma 4 family and Gemma 3

Gemma 4 E4B delivers strong edge performance while the larger models handle heavier workloads.

Benchmark
Gemma 4 E4B IT
Thinking
Featured
Gemma 4 E2B IT
Thinking
Gemma 4 31B IT
Thinking
Gemma 3 27B IT
MMLU Pro
Knowledge & reasoning
69.4%60.0%85.2%67.6%
AIME 2026
Mathematics
No tools
42.5%37.5%89.2%20.8%
GPQA Diamond
Scientific knowledge
58.6%43.4%84.3%42.4%
LiveCodeBench v6
Competitive coding
52.0%44.0%80.0%29.1%
Codeforces ELO
Competitive programming
9406332150-
MMMU Pro
Multimodal reasoning
52.6%44.2%76.9%49.7%
MATH-Vision
Visual math reasoning
59.5%52.4%85.6%-
Audio Support
Native audio input
YesYesNoNo
Context Window
Maximum tokens
128K128K256K128K

Benchmark results from official Gemma 4 model card. E4B benchmarks demonstrate exceptional efficiency for its parameter count.

Native Audio

Speech understanding without a transcription pipeline

Gemma 4 E4B includes a USM-style conformer audio encoder that processes speech and audio directly. No separate ASR model needed - just feed audio in and get intelligent responses.

  • ~300M parameter conformer audio encoder built into the model
  • Process audio clips up to 30 seconds directly
  • Ideal for voice assistants, audio analysis, and accessibility tools
Speech understanding without a transcription pipeline

Edge Deployment

From browser to phone to Raspberry Pi

Gemma 4 E4B is designed for deployment anywhere. Run it in Chrome with WebGPU via transformers.js, on phones with ONNX, or on laptops with Ollama. As little as 5.5GB VRAM at 4-bit quantization.

  • Browser: transformers.js with WebGPU acceleration in Chrome
  • Mobile: ONNX checkpoints for iOS and Android deployment
  • Local: Ollama, llama.cpp, MLX for private on-device inference
From browser to phone to Raspberry Pi

Vision & Documents

Image understanding and document parsing on-device

The ~150M vision encoder processes images with variable aspect ratios and configurable token budgets. Strong OCR and document understanding make it practical for on-device document analysis.

  • 52.6% on MMMU Pro multimodal reasoning
  • Variable image resolution: 70 to 1120 tokens per image
  • Document parsing, OCR, chart comprehension on-device
Image understanding and document parsing on-device

Part of Gemma 4

The edge model in a frontier family

Gemma 4 E4B is the recommended edge model in the Gemma 4 family. Step up to 26B MoE or 31B Dense when you need more power, or down to E2B for the smallest footprint.

Gemma 4 E2B

Ultra-compact 2.3B model for the tightest hardware constraints

Compare

Gemma 4 26B

MoE model with near-31B quality at 4B inference cost

Learn more

Gemma 4 31B

Flagship dense model for maximum performance

Learn more

Documentation

Complete guides for integration and deployment

Read docs

Community

Join developers building with Gemma

Explore

Model Card

Technical specifications and evaluation results

View details

Get started

Ready to run AI on-device with Gemma 4 E4B?

Start chatting for free, or download the model for private, on-device deployment. No data leaves your device.