DeepSeek R1

Summary

Overview

DeepSeek R1 is DeepSeek's reasoning-specialized model, released January 20, 2025 — part of the same release event that introduced the world to DeepSeek's extraordinarily cost-efficient approach to training frontier AI. R1 is a pure reasoning model: it uses reinforcement learning (rather than supervised fine-tuning on human-generated reasoning chains) to develop its reasoning capabilities, arriving at a model that outperformed OpenAI's o1 on AIME 2024 (96.3% vs 79.2%) while being available at a fraction of the cost. The DeepSeek-R1 release, alongside V3, became one of the most discussed AI events of 2025.

R1's significance is partly technical — it demonstrated that strong reasoning can emerge from RL training without expensive human-labeled chain-of-thought data — and partly economic: OpenAI charged $60 per million output tokens for o1 at the time; R1 was available for a small fraction of that.

Specifications

Developer: DeepSeek
Model String: deepseek-reasoner (maps to R1)
Release Date: January 20, 2025
Type: Reasoning-optimized LLM, Mixture-of-Experts, Open-Weight
Architecture: MoE — 671B total parameters (37B active per token)
Context Window: 128,000 tokens
License: MIT (open-weight, fully permissive)
Access: DeepSeek API, Hugging Face, together.ai, AWS, and many third-party providers
Pricing: ~$0.42 per million output tokens (DeepSeek API) — a fraction of comparable reasoning models

Capabilities

AIME 2024 — Math Reasoning: 96.3% on AIME 2024 vs. OpenAI o1's 79.2% at launch — a definitive result on one of the hardest public math benchmarks for AI.

MMLU: 84.9% on MMLU, outperforming most open-source competitors at release.

RL-Trained Reasoning: R1's chain-of-thought reasoning emerged from pure reinforcement learning — the model was rewarded for correct answers and learned to reason without being trained on human-generated reasoning traces. This is a methodologically significant difference from OpenAI's o-series approach.

Open-Weight: MIT licensed — fully permissible for commercial use, fine-tuning, and self-hosting. This made it immediately available for distillation into smaller models, and many R1-distilled variants (7B, 14B, 32B, 70B) followed quickly.

Cost: At ~$0.42 per million output tokens vs. $60 for OpenAI o1 at the time of release — roughly 140x cheaper for comparable reasoning performance.

Limitations

The 128K context window is smaller than most frontier alternatives. DeepSeek is subject to the same data sovereignty concerns as V3.1 for API usage (though self-hosted deployment of the open weights is fully under user control). R1 has higher latency than non-reasoning models due to its internal chain-of-thought generation before responding. For tasks that don't require step-by-step reasoning, DeepSeek V3.1 offers better speed and lower cost.

Recent Developments

January 20, 2025 Launch: R1's release alongside V3 became a landmark moment in AI — the combination demonstrated that Chinese labs had reached frontier capability at dramatically lower cost, challenging assumptions about the competitive dynamics of the AI race.
Distilled Variants: The MIT license enabled rapid community development of R1-distilled models in smaller sizes (1.5B to 70B), making R1's reasoning approach accessible on consumer hardware.
Industry Impact: The R1 release is credited with accelerating pricing competition across the AI industry, with multiple labs subsequently reducing API prices.

Last Updated

February 26, 2026

→ Back to Models