Gemma 4 E4B

4.5 billion parameters of on-device intelligence with native audio

Gemma 4 E4B packs 4.5B effective parameters into a model that runs on laptops, phones, and browsers. With native audio, image, and text understanding plus a 128K context window, it brings frontier-class multimodal AI to the edge.

Start Chatting View benchmarks

Model variants

Instruction-tuned for edge deployment

Gemma 4 E4B uses Per-Layer Embeddings (PLE) to maximize parameter efficiency, delivering strong performance from a compact architecture.

Per-Layer Embeddings Architecture

4.5B effective parameters, 8B total with embeddings

Gemma 4 E4B uses PLE to give each decoder layer its own conditioning pathway. With 42 layers and a ~150M vision encoder plus ~300M audio encoder, it processes text, images, and audio natively.

Ideal for on-device deployment, browser-based AI, and privacy-focused applications where data never leaves the user's device.

Start Chatting See capabilities

Instruction-tuned

E4B Instruct

Optimized for conversational AI, audio understanding, and on-device task completion

Fine-tuned for following instructions with native multimodal support including audio input

Available now

Start Chatting Download weights

Pre-trained

E4B Base

Foundation model for fine-tuning edge and mobile applications

Pre-trained on diverse multimodal data including audio for maximum flexibility

Available now

View on HuggingFace Fine-tuning guide

Capabilities

Desktop-class intelligence on edge hardware

Gemma 4 E4B brings multimodal understanding, coding assistance, and reasoning to devices that fit in your hand.

Native audio input

USM-style conformer audio encoder processes speech and audio clips up to 30 seconds directly, no transcription pipeline needed.

Strong reasoning

Configurable thinking mode with 42.5% on AIME 2026 math and 58.6% on GPQA Diamond graduate-level science.

Capable coding

52% on LiveCodeBench v6 and 940 Codeforces ELO. Native function calling enables on-device agentic workflows.

128K context window

Process long documents, entire codebases, and extended conversations on-device with hybrid local/global attention.

Vision understanding

52.6% on MMMU Pro and 59.5% on MATH-Vision. Variable aspect ratio support with configurable image token budgets.

Run anywhere

Runs in browsers via WebGPU, on phones via ONNX, and on laptops via Ollama. As little as 5.5GB VRAM at 4-bit quantization.

Key highlights

Edge performance metrics

Gemma 4 E4B delivers strong results across diverse benchmarks while fitting on consumer hardware.

Top achievements

69.4% on MMLU Pro knowledge and reasoning
52% on LiveCodeBench v6 coding
58.6% on GPQA Diamond scientific knowledge
52.6% on MMMU Pro multimodal reasoning
128K token context window

Technical specs

4.5B effective parameters (8B with embeddings)
42 decoder layers with Per-Layer Embeddings
~150M vision encoder + ~300M audio encoder
Native text, image, video, and audio input
5.5-6GB VRAM at 4-bit quantization

Start Chatting View model card

Performance

Punches far above its weight class

Gemma 4 E4B achieves 69.4% on MMLU Pro and 52% on LiveCodeBench v6 with only 4.5B effective parameters - outperforming many models twice its size.

Gemma 4 E4B demonstrates that edge models can deliver meaningful performance across reasoning, coding, and multimodal tasks.

Start Chatting View model card

Gemma 4 E4B performance comparison chart

69.4% on MMLU Pro - strong knowledge and reasoning for an edge model

52% on LiveCodeBench v6 - practical coding assistance on-device

58.6% on GPQA Diamond - graduate-level science understanding

52.6% on MMMU Pro - multimodal reasoning with images

940 Codeforces ELO - competitive programming capability

Benchmark comparison

E4B vs the Gemma 4 family and Gemma 3

Gemma 4 E4B delivers strong edge performance while the larger models handle heavier workloads.

Benchmark	Gemma 4 E4B IT Thinking Featured	Gemma 4 E2B IT Thinking	Gemma 4 31B IT Thinking	Gemma 3 27B IT
MMLU Pro Knowledge & reasoning	69.4%	60.0%	85.2%	67.6%
AIME 2026 Mathematics No tools	42.5%	37.5%	89.2%	20.8%
GPQA Diamond Scientific knowledge	58.6%	43.4%	84.3%	42.4%
LiveCodeBench v6 Competitive coding	52.0%	44.0%	80.0%	29.1%
Codeforces ELO Competitive programming	940	633	2150	-
MMMU Pro Multimodal reasoning	52.6%	44.2%	76.9%	49.7%
MATH-Vision Visual math reasoning	59.5%	52.4%	85.6%	-
Audio Support Native audio input	Yes	Yes	No	No
Context Window Maximum tokens	128K	128K	256K	128K

Benchmark results from official Gemma 4 model card. E4B benchmarks demonstrate exceptional efficiency for its parameter count.

Native Audio

Speech understanding without a transcription pipeline

Gemma 4 E4B includes a USM-style conformer audio encoder that processes speech and audio directly. No separate ASR model needed - just feed audio in and get intelligent responses.

~300M parameter conformer audio encoder built into the model
Process audio clips up to 30 seconds directly
Ideal for voice assistants, audio analysis, and accessibility tools

Try audio input View documentation

Speech understanding without a transcription pipeline

Edge Deployment

From browser to phone to Raspberry Pi

Gemma 4 E4B is designed for deployment anywhere. Run it in Chrome with WebGPU via transformers.js, on phones with ONNX, or on laptops with Ollama. As little as 5.5GB VRAM at 4-bit quantization.

Browser: transformers.js with WebGPU acceleration in Chrome
Mobile: ONNX checkpoints for iOS and Android deployment
Local: Ollama, llama.cpp, MLX for private on-device inference

Try in browser Download for local use

Vision & Documents

Image understanding and document parsing on-device

The ~150M vision encoder processes images with variable aspect ratios and configurable token budgets. Strong OCR and document understanding make it practical for on-device document analysis.

52.6% on MMMU Pro multimodal reasoning
Variable image resolution: 70 to 1120 tokens per image
Document parsing, OCR, chart comprehension on-device

Try vision tasks See examples

Image understanding and document parsing on-device

Get started

Try Gemma 4 E4B now

Start chatting instantly or download for on-device deployment.

Start Chatting

Chat with Gemma 4 E4B instantly, no setup required

Watch overview

Official Gemma 4 introduction video

Model card

Complete technical specifications and benchmarks

Documentation

Integration guides and best practices

Download weights

On-device deployment

Download official model weights for edge and local deployment.

Hugging Face

Official Gemma 4 E4B model repository

Kaggle

Download from Kaggle Models

Ollama

Run locally with Ollama

Edge platforms

Browser and mobile deployment

Deploy on edge devices, browsers, and mobile platforms.

transformers.js

Run in browsers with WebGPU acceleration

ONNX Runtime

Cross-platform edge deployment

MLX

Optimized for Apple Silicon

llama.cpp

Efficient CPU and GPU inference

Part of Gemma 4

The edge model in a frontier family

Gemma 4 E4B is the recommended edge model in the Gemma 4 family. Step up to 26B MoE or 31B Dense when you need more power, or down to E2B for the smallest footprint.

Explore all models Official page

Gemma 4 E2B

Ultra-compact 2.3B model for the tightest hardware constraints

Compare

Gemma 4 26B

MoE model with near-31B quality at 4B inference cost

Learn more

Gemma 4 31B

Flagship dense model for maximum performance

Learn more

Documentation

Complete guides for integration and deployment

Read docs

Community

Join developers building with Gemma

Explore

Model Card

Technical specifications and evaluation results

View details

Get started

Ready to run AI on-device with Gemma 4 E4B?

Start chatting for free, or download the model for private, on-device deployment. No data leaves your device.

Start Free Chat Download weights

4.5 billion parameters of on-device intelligence with native audio

Multimodal intelligence that fits in your pocket

Instruction-tuned for edge deployment

4.5B effective parameters, 8B total with embeddings

E4B Instruct

E4B Base

Desktop-class intelligence on edge hardware

Native audio input

Strong reasoning

Capable coding

128K context window

Vision understanding

Run anywhere

Edge performance metrics

Punches far above its weight class

E4B vs the Gemma 4 family and Gemma 3

Speech understanding without a transcription pipeline

From browser to phone to Raspberry Pi

Image understanding and document parsing on-device

Try Gemma 4 E4B now

On-device deployment

Browser and mobile deployment

The edge model in a frontier family

Gemma 4 E2B

Gemma 4 26B

Gemma 4 31B

Documentation

Community

Model Card

Ready to run AI on-device with Gemma 4 E4B?