DeepSeek V4-Pro is the flagship variant of DeepSeek's V4 generation, released April 24, 2026, alongside the smaller V4-Flash variant.
DeepSeek V4-Pro is the flagship variant of DeepSeek's V4 generation, released April 24, 2026, alongside the smaller V4-Flash variant. Both models are available via the DeepSeek API and as open weights under the MIT license — continuing DeepSeek's pattern of releasing frontier-grade Chinese models under permissive open licensing. V4-Pro pushes DeepSeek's mixture-of-experts architecture to 1.6 trillion total parameters with 49 billion activated per token, while introducing a new hybrid attention mechanism that dramatically reduces inference cost.
The V4 release is a major efficiency story: at 1M-token context, V4-Pro requires only ~27% of V3.2's single-token inference FLOPs and ~10% of V3.2's KV cache footprint. Combined with the MIT license and DeepSeek's reputation for low API pricing, V4-Pro is the strongest argument yet that frontier-class capability is becoming commoditized through open-weight releases. Reporting from independent reviewers (including Simon Willison) describes V4 as "almost on the frontier, a fraction of the price."
1M Token Context with Efficient Attention: At 1M-token context, V4-Pro requires only ~27% of V3.2's single-token inference FLOPs and ~10% of its KV cache footprint — making 1M context economically practical at a price point closer to short-context models.
Math, STEM, and Coding: V4-Pro reportedly beats all current open models in math, STEM, and coding evaluations, and rivals top closed-source models in these categories. Independent reviewers describe it as "almost on the frontier, a fraction of the price."
Mixture-of-Experts at Frontier Scale: 1.6T total parameters with 49B activated per token — putting V4-Pro in the same parameter-count weight class as the largest publicly described frontier models (alongside xAI's announced 6T-parameter Grok 5).
Open-Weight Distribution: MIT license with no acceptable-use clauses or commercial restrictions — the most permissive license among major frontier-class models.
384K Token Output: One of the largest single-completion output capacities in any production model — useful for very long structured outputs (codebases, books, large reports).
Geopolitical/Export Control Exposure: DeepSeek V4 development reportedly used NVIDIA Blackwell chips in alleged violation of U.S. export control restrictions (as covered in February 2026 reporting). DeepSeek reportedly withheld V4 release from U.S. chipmakers while granting early access to domestic Chinese semiconductor suppliers — a posture that could complicate enterprise procurement in jurisdictions sensitive to U.S. export-control compliance.
Training Data Allegations: Anthropic's [[Dario Amodei]] alleged in February 2026 that DeepSeek used fraudulent accounts to generate millions of Claude conversations for training data purposes. These allegations have not been independently verified but factor into how some Western enterprises evaluate DeepSeek models.
Self-Hosted Compute Cost: While the new hybrid attention mechanism dramatically reduces per-token cost, running a 1.6T-parameter MoE locally still requires substantial compute. The smaller V4-Flash (284B/13B activated) is the more practical self-hosted option for most teams.
May 7, 2026