Gemini 2.5 Flash is Google DeepMind's previous-generation fast multimodal model, released June 17, 2025 — offering a 1M+ context window at $0.30/$2.50 per million tokens with an optional thinking mode. Now superseded by Gemini 3 Flash but still widely used in production.
Gemini 2.5 Flash is Google DeepMind's previous-generation fast model, released June 17, 2025. It has since been superseded by Gemini 3 Flash (December 2025) as the recommended mid-tier Gemini option, but remains available and widely used in production integrations. At $0.30 per million input tokens and $2.50 per million output tokens, it offers 1M token context at a price point substantially below most comparable models from Anthropic and OpenAI.
For teams already integrated with Gemini 2.5 Flash, it continues to perform well. New projects should evaluate Gemini 3 Flash first, which offers improved benchmark performance at a comparable price tier.
gemini-2.5-flash (check Google AI docs for versioned string)1M Context at Low Cost: One of the most cost-efficient ways to access 1M token context. At $0.30/$2.50 per million tokens, it significantly undercuts comparable context window options from Anthropic and OpenAI.
Native Multimodality: Processes text, images, video, and audio inputs natively — capable for document analysis, visual Q&A, video understanding, and audio transcription tasks.
Speed: Flash-tier latency — optimized for responsive, high-throughput deployments where time-to-first-token matters.
Thinking Mode: Includes optional thinking mode for tasks that benefit from extended reasoning, allowing developers to trade latency for reasoning depth on a per-request basis.
Superseded by Gemini 3 Flash, which offers better benchmark performance at a similar price tier. Max output of 65,536 tokens is lower than some competing models. New projects should strongly consider Gemini 3 Flash instead.
February 26, 2026