Llama 4 Maverick is Meta's flagship open-weight MoE model, released April 5, 2025 — 17B active / 400B total parameters across 128 experts, with a 1M token context window and native multimodality. The first open-weight model competitive with GPT-4o and Gemini 2.0 Flash at launch.
Llama 4 Maverick is Meta's current best open-weight model, released April 5, 2025 as part of the Llama 4 family. It uses a Mixture-of-Experts (MoE) architecture with 17 billion active parameters across 128 experts (400 billion total parameters), and benchmarks show it competitive with GPT-4o and Gemini 2.0 Flash at launch — a remarkable achievement for an open-weight model. With a 1 million token context window and native multimodal capability (text and image), Maverick represents the strongest argument that open-source AI can compete with closed frontier models.
What makes Maverick significant is not just its benchmark performance but its accessibility: as an open-weight model under Meta's Llama license, developers can run it on their own infrastructure, fine-tune it without restriction, and deploy it without per-token API costs. This makes it particularly attractive for enterprises with privacy requirements, high-volume workloads, or the need for domain-specific customization.
meta-llama/Llama-4-Maverick (varies by platform)Benchmark Performance: At launch, Maverick beat GPT-4o and Gemini 2.0 Flash across a broad set of benchmarks — a first for an open-weight model competing head-to-head with closed frontier models.
1M Token Context: Enables full-codebase analysis, long document reasoning, and extended autonomous task sessions at no API cost for self-hosted deployments.
Native Multimodality: Trained from scratch on text, image, and video data — not a vision adapter added after the fact. Capable of image understanding, visual Q&A, and chart analysis.
MoE Efficiency: Despite 400B total parameters, only 17B are active per token — meaning inference cost and latency profile resembles a 17B model, not a 400B one.
Open-Weight Advantage: Can be fine-tuned on proprietary data, deployed on-premises, and customized without vendor dependency or per-token costs.
The 1M context window requires significant memory to serve at full length. For truly on-device deployment on a single GPU, Llama 4 Scout (109B total params) is more practical. Benchmark performance comparisons vs. the Gemini 3 and Claude 4-series models (released later in 2025) are less favorable than the April 2025 launch comparisons against older models.
February 26, 2026