DeepSeek V4-Flash

Summary

DeepSeek V4-Flash is DeepSeek's fast, cost-efficient frontier MoE model, released April 23, 2026. With 284B total parameters and just 13B activated per token, V4-Flash has the smallest activation footprint among Tier-1 open-weight models.

Overview

DeepSeek V4-Flash is DeepSeek's fast, cost-efficient frontier MoE model, released April 23, 2026 — one day ahead of the heavier-weight [[DeepSeek/DeepSeek V4-Pro|DeepSeek V4-Pro]] preview. It is not a distillation of V4-Pro: V4-Flash is its own pretraining run (32T+ tokens) on the same V4 architecture family, sized for the "intelligence-per-parameter" sweet spot. With 284B total parameters and just 13B activated per token, V4-Flash has the smallest activation footprint among Tier-1 open-weight models.

V4-Flash extends DeepSeek's defining open-weight strategy: MIT-licensed, full weights on Hugging Face, and pricing that further compresses the cost gap with closed frontier models. A DeepSeek-V4-Flash-Max variant (also April 23, 2026) targets workflows that need a larger "thinking budget" while staying within the Flash family.

Specifications

  • Developer: DeepSeek
  • Release Date: April 23, 2026 (V4-Flash and V4-Flash-Max)
  • Type: Mixture-of-Experts text-generation, multimodal
  • Total Parameters: 284B (13B active per token — smallest activation footprint among Tier-1 models)
  • Context Window: 1M tokens
  • Training Data: 32T+ tokens (independent pretraining run; not distilled from Pro)
  • License: MIT (full open weights on Hugging Face)
  • Throughput: ~83.6 tokens/sec; ~1.04s time-to-first-token on DeepSeek's hosted API (Artificial Analysis benchmarks)
  • Distribution: DeepSeek API, Hugging Face (deepseek-ai/DeepSeek-V4-Flash), self-hosted

Capabilities

Best-in-Class Activation Efficiency: 13B active parameters per token is the lowest among Tier-1 open-weight models — meaning lower per-token inference compute and lower hosting cost than peers at comparable benchmark tier.

1M-Token Context: Full 1M-token context window — at parity with Google's Gemini family and ahead of most open-weight peers.

Reasoning Performance Near Pro: With a larger thinking budget, V4-Flash-Max approaches V4-Pro reasoning performance; smaller parameter scale naturally trails Pro on the most complex agentic workflows and pure knowledge tasks.

Open-Weight Math/STEM/Coding Leadership: V4 family beats all current open-weight models in Math/STEM/Coding and rivals top closed-source models. V4-Pro hits LiveCodeBench 93.5 and Codeforces ELO 3206 (ahead of GPT-5.5 at 3168); V4-Flash inherits the family's strong coding profile at lower cost.

Fully MIT Licensed: Unlike MiniMax M2.7, V4-Flash is freely usable commercially — making it the most production-friendly open-weight frontier model in this generation.

Limitations

V4-Flash trails V4-Pro on the most complex agentic workflows and the hardest knowledge benchmarks — this is the explicit tradeoff for its lower activation footprint and lower cost. As with prior DeepSeek releases, U.S. enterprise deployment in regulated industries continues to face additional scrutiny around Chinese-origin model provenance and hosted-API data handling. Self-hosting requires substantial GPU memory even for the activated subset.

Recent Developments

  • April 23, 2026 — DeepSeek V4-Flash and V4-Flash-Max released on Hugging Face under MIT license.
  • April 24, 2026 — V4-Pro preview released, completing the V4 family launch.
  • Spring 2026 — V4-Flash is one of four Chinese open-weight coding models released within a 12-day window (alongside Z.ai GLM-5.1, MiniMax M2.7, Moonshot Kimi K2.6) — collectively reshaping the open-weight coding frontier.
  • Spring 2026 — Artificial Analysis clocks V4-Flash at 83.6 tokens/sec and ~1.04s TTFT on DeepSeek's hosted API, well above the median for open-weight models of comparable size.

Last Updated

May 9, 2026