Llama 4 Scout is Meta's context-efficient open-weight model, released April 5, 2025 — a 17B active / 109B total MoE across 16 experts with a 10M token context window, the largest of any publicly available model at launch, fitting on a single NVIDIA H100.
Llama 4 Scout is Meta's most context-efficient open-weight model, released April 5, 2025 alongside Llama 4 Maverick. Its defining feature is a 10 million token context window — the largest of any publicly available model at launch, nearly 10x the 1M context offered by competing models. At the same time, with only 17 billion active parameters across 16 experts (109 billion total), it fits on a single NVIDIA H100 GPU — making it practical for teams that need extreme context length without a multi-GPU serving setup.
Scout is purpose-built for tasks that demand massive context: analyzing entire codebases, processing large document collections, long-running research sessions, or any scenario where the cost of chunking and retrieval introduces errors. The combination of 10M context and single-GPU efficiency is a unique position in the current model landscape.
meta-llama/Llama-4-Scout (varies by platform)10M Token Context: The headline capability — 10 million tokens is enough to hold a very large codebase, a full year of documents, or an extraordinarily long research session entirely in context, eliminating retrieval-augmented generation (RAG) for many use cases.
Single-GPU Efficiency: Fits on one NVIDIA H100 despite 109B total parameters, thanks to the MoE architecture activating only 17B parameters per token. This dramatically reduces the infrastructure cost of serving a model at this scale.
More Capable Than Previous Llama Generations: Despite its compact active parameter count, Scout outperforms all prior Llama models on standard benchmarks.
Native Multimodality: Trained on text, image, and video data from the ground up.
Scout's benchmark performance on pure reasoning and coding tasks is below Llama 4 Maverick, which has 128 experts versus Scout's 16. The 10M context window is theoretically supported but practically requires careful memory management — full 10M context usage will stress even H100 memory. For maximum capability, Maverick is the stronger choice; Scout is the choice when extreme context length is the priority.
February 26, 2026