DeepSeek V3.1

Summary

Overview

DeepSeek V3.1 is DeepSeek's current general-purpose flagship model, released August 2025 by the Chinese AI lab (backed by High-Flyer hedge fund) that shocked the AI world in January 2025 with its original V3 release. V3.1 combines the strengths of DeepSeek-V3 and DeepSeek-R1, using the same 671 billion parameter Mixture-of-Experts architecture (37 billion active per token) while incorporating the reinforcement learning training insights that made R1 a breakthrough. The result is a model that performs well on reasoning, coding, and tool-use tasks while remaining the most cost-efficient option among major AI models at $0.15 per million input tokens.

DeepSeek's significance goes beyond its models: the original V3, reportedly trained for approximately $5.9 million — a fraction of what US labs spend — demonstrated that frontier AI capability doesn't require billion-dollar training budgets. This changed the conversation about AI economics and sent ripples through the entire industry, raising serious questions about the moats of US AI labs.

Specifications

Developer: DeepSeek (Hangzhou, China; backed by High-Flyer Capital Management)
Model String: deepseek-chat (maps to V3.1 as of August 2025)
Release Date: August 2025
Type: General-purpose LLM, Mixture-of-Experts, Open-Weight
Architecture: MoE — 37B active parameters / 671B total parameters
Context Window: 128,000 tokens
License: MIT (open-weight, fully permissive)
Access: DeepSeek API, Hugging Face, together.ai, AWS, Azure, and many third-party providers
Pricing: $0.15 per million input tokens (cached) — among the lowest of any major model

Capabilities

Cost Efficiency: At $0.15 per million input tokens, DeepSeek V3.1 is dramatically cheaper than Western alternatives — roughly 20x cheaper than Claude Sonnet 4.6 and 12x cheaper than GPT-5.2 base. For high-volume deployments, this pricing is transformative.

Strong Coding: V3.1 incorporates R1's reinforcement learning advances, resulting in coding performance that outperformed GPT-4.5 on math and coding evaluations at launch.

Reasoning Improvements: The hybrid V3/R1 training approach gives V3.1 better multi-step reasoning than pure V3, without the full latency overhead of a dedicated reasoning model like R1.

Open-Weight MIT License: Fully open, commercially permissive — can be deployed, fine-tuned, and monetized without restriction.

Tool Use & Function Calling: Improved tool-use capabilities over earlier DeepSeek versions, enabling more reliable integration into agentic workflows.

Limitations

The 128K context window is smaller than 1M+ offerings from Anthropic, OpenAI, and Google. DeepSeek is a Chinese company subject to Chinese law, which raises data sovereignty and regulatory concerns for enterprises in certain jurisdictions — notably the US government and regulated industries. Some organizations have blocked DeepSeek API access on security grounds. Self-hosted deployment of the open weights avoids API data concerns but requires significant infrastructure for a 671B parameter model.

Recent Developments

January 2025 — V3 Shock: The original DeepSeek-V3 release for ~$5.9M training cost sent shockwaves through the AI industry, causing a significant stock sell-off in AI-related equities.
August 2025 — V3.1: Combined V3's general capability with R1's reinforcement learning advances, becoming the current recommended general-purpose model from DeepSeek.
V3.2 in Development: Reports indicate DeepSeek-V3.2 is in development, with further improvements expected.

Last Updated

February 26, 2026

→ Back to Models