DeepSeek R1 is DeepSeek's reasoning-specialized model, released January 20, 2025 — part of the same release event that introduced the world to DeepSeek's extraordinarily cost-efficient approach to training frontier AI.
DeepSeek R1 is DeepSeek's reasoning-specialized model, released January 20, 2025 — part of the same release event that introduced the world to DeepSeek's extraordinarily cost-efficient approach to training frontier AI. R1 is a pure reasoning model: it uses reinforcement learning (rather than supervised fine-tuning on human-generated reasoning chains) to develop its reasoning capabilities, arriving at a model that outperformed OpenAI's o1 on AIME 2024 (96.3% vs 79.2%) while being available at a fraction of the cost. The DeepSeek-R1 release, alongside V3, became one of the most discussed AI events of 2025.
R1's significance is partly technical — it demonstrated that strong reasoning can emerge from RL training without expensive human-labeled chain-of-thought data — and partly economic: OpenAI charged $60 per million output tokens for o1 at the time; R1 was available for a small fraction of that.
deepseek-reasoner (maps to R1)AIME 2024 — Math Reasoning: 96.3% on AIME 2024 vs. OpenAI o1's 79.2% at launch — a definitive result on one of the hardest public math benchmarks for AI.
MMLU: 84.9% on MMLU, outperforming most open-source competitors at release.
RL-Trained Reasoning: R1's chain-of-thought reasoning emerged from pure reinforcement learning — the model was rewarded for correct answers and learned to reason without being trained on human-generated reasoning traces. This is a methodologically significant difference from OpenAI's o-series approach.
Open-Weight: MIT licensed — fully permissible for commercial use, fine-tuning, and self-hosting. This made it immediately available for distillation into smaller models, and many R1-distilled variants (7B, 14B, 32B, 70B) followed quickly.
Cost: At ~$0.42 per million output tokens vs. $60 for OpenAI o1 at the time of release — roughly 140x cheaper for comparable reasoning performance.
The 128K context window is smaller than most frontier alternatives. DeepSeek is subject to the same data sovereignty concerns as V3.1 for API usage (though self-hosted deployment of the open weights is fully under user control). R1 has higher latency than non-reasoning models due to its internal chain-of-thought generation before responding. For tasks that don't require step-by-step reasoning, DeepSeek V3.1 offers better speed and lower cost.
February 26, 2026