DeepSeek

DeepSeek

Overview

DeepSeek represents an unusual entry into the AI landscape, founded in July 2023 by Liang Wenfeng as an internal initiative of High-Flyer, a major Chinese quantitative hedge fund. Rather than pursuing traditional venture capital funding, DeepSeek is entirely capitalized through High-Flyer's internal resources (~$8B in assets under management), creating a unique corporate structure insulated from external investor pressures. The company's stated mission emphasizes efficiency and cost-effective training methodologies, claiming to achieve competitive performance with significantly lower computational overhead than Western competitors. DeepSeek's open-source release strategy and export control complexities place it at the center of emerging geopolitical tensions around AI development and technology access.

In April 2026, DeepSeek shipped DeepSeek V4 — its most ambitious model release to date — in two production-ready variants (V4-Pro and V4-Flash), both available via the DeepSeek API and as open weights under the MIT license. The V4 release introduces a new hybrid attention architecture (Compressed Sparse Attention + Heavily Compressed Attention), 1M-token context windows, and dramatic compute and KV-cache efficiency gains over V3.2 — putting DeepSeek within striking distance of frontier closed-source models on math, STEM, and coding benchmarks at a fraction of the price.

Key Details

  • Founded: July 2023
  • Founder: Liang Wenfeng (co-founder, High-Flyer hedge fund)
  • Headquarters: Hangzhou, China
  • Funding: Privately funded by High-Flyer hedge fund (~$8B AUM); no external venture capital rounds
  • Valuation: Not publicly disclosed
  • Website: https://www.deepseek.com

Current Models

  • [[DeepSeek V4-Pro]] — Flagship V4 model; 1.6T total parameters / 49B activated (MoE); 1M-token context, 384K max output; hybrid Compressed Sparse Attention + Heavily Compressed Attention; ~27% of single-token inference FLOPs and ~10% of KV cache vs. V3.2 at 1M context; MIT license; released April 24, 2026
  • [[DeepSeek V4-Flash]] — Efficient V4 variant. 284B total parameters / 13B activated (MoE) — smallest activation footprint among Tier-1 open-weight models. Independent 32T+ token pretraining run (not a distillation of Pro). 1M-token context, MIT license. Released April 23, 2026 alongside DeepSeek-V4-Flash-Max for larger-thinking-budget workflows.
  • [[DeepSeek R1]] — Reasoning-focused model, MIT license, 671B total parameters, 128K context window, achieved 96.3% on AIME 2024 benchmark, released January 2025
  • [[DeepSeek V3.1]] — Previous-generation general-purpose model; MIT license; 37B active / 671B total; 128K context; $0.15/M input tokens; released August 2025

Key People

No individuals from the "Tracked People" list are affiliated with DeepSeek.

Recent Developments

  • DeepSeek V4 Launch (April 24, 2026): V4-Pro (1.6T total / 49B active) and V4-Flash (284B total / 13B active) released as open weights under MIT license. Both support 1M-token context and 384K-token output. The hybrid attention architecture (Compressed Sparse Attention + Heavily Compressed Attention) yields ~27% of V3.2's single-token inference FLOPs and ~10% of its KV cache footprint at 1M-token context — a major efficiency lead. V4 reportedly beats all current open models on math, STEM, and coding benchmarks while approaching frontier closed-source performance at substantially lower price.
  • Earlier Export Control Allegations (February 2026): V4 development reportedly used Nvidia Blackwell chips in alleged violation of U.S. export control restrictions. DeepSeek withheld V4 release from U.S. chipmakers while granting early access to domestic Chinese semiconductor suppliers.
  • Anthropic Allegations: [[Dario Amodei]] alleged DeepSeek used fraudulent accounts to generate millions of Claude conversations for training data (February 2026).
  • MIT License Strategy: DeepSeek continues to release all production models under permissive MIT licensing, maintaining its reputation as the most capable open-weight provider from China.

Why They Matter

DeepSeek challenges prevailing assumptions about AI scaling and cost efficiency, demonstrating that competitive capability can be achieved through different engineering approaches and computational constraints. The April 2026 V4 release is the strongest empirical case yet: a 1.6T-parameter open-weight model with frontier-class math/STEM/coding performance at ~10% of V3.2's KV cache cost, distributed under a permissive MIT license. The company's unusual financing structure—entirely through hedge fund capitalization—insulates it from venture investor timelines but potentially creates opacity around technical capabilities and development practices. From a geopolitical perspective, DeepSeek exemplifies the strategic importance of AI technology to Chinese technology policy and the emerging technology bifurcation between U.S. and Chinese AI ecosystems. The data training allegations and export control tensions make DeepSeek a flashpoint in broader debates about AI development ethics and international technology governance.

Last Updated

May 9, 2026