DeepSeek V4-Pro

Summary

DeepSeek V4-Pro is the flagship variant of DeepSeek's V4 generation, released April 24, 2026, alongside the smaller V4-Flash variant.

Overview

DeepSeek V4-Pro is the flagship variant of DeepSeek's V4 generation, released April 24, 2026, alongside the smaller V4-Flash variant. Both models are available via the DeepSeek API and as open weights under the MIT license — continuing DeepSeek's pattern of releasing frontier-grade Chinese models under permissive open licensing. V4-Pro pushes DeepSeek's mixture-of-experts architecture to 1.6 trillion total parameters with 49 billion activated per token, while introducing a new hybrid attention mechanism that dramatically reduces inference cost.

The V4 release is a major efficiency story: at 1M-token context, V4-Pro requires only ~27% of V3.2's single-token inference FLOPs and ~10% of V3.2's KV cache footprint. Combined with the MIT license and DeepSeek's reputation for low API pricing, V4-Pro is the strongest argument yet that frontier-class capability is becoming commoditized through open-weight releases. Reporting from independent reviewers (including Simon Willison) describes V4 as "almost on the frontier, a fraction of the price."

Specifications

  • Developer: DeepSeek (subsidiary of High-Flyer hedge fund)
  • Release Date: April 24, 2026
  • Type: Mixture-of-Experts large language model
  • Architecture: Hybrid attention combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA)
  • Total Parameters: 1.6 trillion
  • Activated Parameters per Token: 49 billion
  • Context Window: 1,000,000 tokens
  • Max Output: 384,000 tokens
  • License: MIT (open weights, commercially permissive)
  • Access: DeepSeek API; open-weight downloads (Hugging Face)

Capabilities

1M Token Context with Efficient Attention: At 1M-token context, V4-Pro requires only ~27% of V3.2's single-token inference FLOPs and ~10% of its KV cache footprint — making 1M context economically practical at a price point closer to short-context models.

Math, STEM, and Coding: V4-Pro reportedly beats all current open models in math, STEM, and coding evaluations, and rivals top closed-source models in these categories. Independent reviewers describe it as "almost on the frontier, a fraction of the price."

Mixture-of-Experts at Frontier Scale: 1.6T total parameters with 49B activated per token — putting V4-Pro in the same parameter-count weight class as the largest publicly described frontier models (alongside xAI's announced 6T-parameter Grok 5).

Open-Weight Distribution: MIT license with no acceptable-use clauses or commercial restrictions — the most permissive license among major frontier-class models.

384K Token Output: One of the largest single-completion output capacities in any production model — useful for very long structured outputs (codebases, books, large reports).

Limitations

Geopolitical/Export Control Exposure: DeepSeek V4 development reportedly used NVIDIA Blackwell chips in alleged violation of U.S. export control restrictions (as covered in February 2026 reporting). DeepSeek reportedly withheld V4 release from U.S. chipmakers while granting early access to domestic Chinese semiconductor suppliers — a posture that could complicate enterprise procurement in jurisdictions sensitive to U.S. export-control compliance.

Training Data Allegations: Anthropic's [[Dario Amodei]] alleged in February 2026 that DeepSeek used fraudulent accounts to generate millions of Claude conversations for training data purposes. These allegations have not been independently verified but factor into how some Western enterprises evaluate DeepSeek models.

Self-Hosted Compute Cost: While the new hybrid attention mechanism dramatically reduces per-token cost, running a 1.6T-parameter MoE locally still requires substantial compute. The smaller V4-Flash (284B/13B activated) is the more practical self-hosted option for most teams.

Recent Developments

  • April 24, 2026 Launch: V4-Pro released alongside V4-Flash, both as open weights under MIT and via the DeepSeek API. New hybrid attention mechanism (CSA + HCA) introduced as the architectural centerpiece.
  • Efficiency Headline: ~27% of V3.2's per-token inference FLOPs and ~10% of V3.2's KV cache at 1M context — making 1M-context inference economically viable at scale.
  • Industry Context: Released the same day as OpenAI GPT-5.5 (April 24, 2026) and one week before Mistral Medium 3.5 (April 29, 2026) — making late April 2026 the most concentrated open-weight + closed-frontier release window in recent memory.
  • Independent Reception: Reviewers including Simon Willison framed V4-Pro as "almost on the frontier, a fraction of the price" — an unusually consistent narrative across Western coverage of a Chinese-origin model.

Last Updated

May 7, 2026