Llama 4 Maverick

Summary

Llama 4 Maverick is Meta's flagship open-weight MoE model, released April 5, 2025 — 17B active / 400B total parameters across 128 experts, with a 1M token context window and native multimodality. The first open-weight model competitive with GPT-4o and Gemini 2.0 Flash at launch.

Overview

Llama 4 Maverick is Meta's current best open-weight model, released April 5, 2025 as part of the Llama 4 family. It uses a Mixture-of-Experts (MoE) architecture with 17 billion active parameters across 128 experts (400 billion total parameters), and benchmarks show it competitive with GPT-4o and Gemini 2.0 Flash at launch — a remarkable achievement for an open-weight model. With a 1 million token context window and native multimodal capability (text and image), Maverick represents the strongest argument that open-source AI can compete with closed frontier models.

What makes Maverick significant is not just its benchmark performance but its accessibility: as an open-weight model under Meta's Llama license, developers can run it on their own infrastructure, fine-tune it without restriction, and deploy it without per-token API costs. This makes it particularly attractive for enterprises with privacy requirements, high-volume workloads, or the need for domain-specific customization.

Specifications

Developer: Meta AI
Model String: meta-llama/Llama-4-Maverick (varies by platform)
Release Date: April 5, 2025
Type: Multimodal LLM, Mixture-of-Experts, Open-Weight
Architecture: MoE — 17B active parameters / 400B total parameters / 128 experts
Context Window: 1,000,000 tokens
Modalities: Text and image input; text output
Languages: 12 languages
License: Meta Llama 4 Community License
Access: Meta AI app, llama.com, Hugging Face, AWS, Azure, Google Cloud, Groq, together.ai, and other cloud providers
Pricing: Free for self-hosted; API pricing varies by provider (e.g., ~$0.20–$0.50 per million tokens on third-party platforms)

Capabilities

Benchmark Performance: At launch, Maverick beat GPT-4o and Gemini 2.0 Flash across a broad set of benchmarks — a first for an open-weight model competing head-to-head with closed frontier models.

1M Token Context: Enables full-codebase analysis, long document reasoning, and extended autonomous task sessions at no API cost for self-hosted deployments.

Native Multimodality: Trained from scratch on text, image, and video data — not a vision adapter added after the fact. Capable of image understanding, visual Q&A, and chart analysis.

MoE Efficiency: Despite 400B total parameters, only 17B are active per token — meaning inference cost and latency profile resembles a 17B model, not a 400B one.

Open-Weight Advantage: Can be fine-tuned on proprietary data, deployed on-premises, and customized without vendor dependency or per-token costs.

Limitations

The 1M context window requires significant memory to serve at full length. For truly on-device deployment on a single GPU, Llama 4 Scout (109B total params) is more practical. Benchmark performance comparisons vs. the Gemini 3 and Claude 4-series models (released later in 2025) are less favorable than the April 2025 launch comparisons against older models.

Recent Developments

April 5, 2025 Launch: Released alongside Llama 4 Scout, marking Meta's entry into the 1M+ context, native multimodal open-weight space.
Llama 4 Behemoth: A forthcoming teacher model (288B active / 2T total parameters) is expected to further extend the Llama 4 family's capabilities, particularly on STEM benchmarks.
Broad Cloud Availability: Available across all major cloud platforms within weeks of launch, giving developers flexibility in deployment environment.

Last Updated

February 26, 2026

→ Back to Models