Gemini 2.5 Flash

Summary

Gemini 2.5 Flash is Google DeepMind's previous-generation fast multimodal model, released June 17, 2025 — offering a 1M+ context window at $0.30/$2.50 per million tokens with an optional thinking mode. Now superseded by Gemini 3 Flash but still widely used in production.

Overview

Gemini 2.5 Flash is Google DeepMind's previous-generation fast model, released June 17, 2025. It has since been superseded by Gemini 3 Flash (December 2025) as the recommended mid-tier Gemini option, but remains available and widely used in production integrations. At $0.30 per million input tokens and $2.50 per million output tokens, it offers 1M token context at a price point substantially below most comparable models from Anthropic and OpenAI.

For teams already integrated with Gemini 2.5 Flash, it continues to perform well. New projects should evaluate Gemini 3 Flash first, which offers improved benchmark performance at a comparable price tier.

Specifications

  • Developer: Google DeepMind
  • Model String: gemini-2.5-flash (check Google AI docs for versioned string)
  • Release Date: June 17, 2025
  • Type: Natively Multimodal LLM (text, audio, images, video, code)
  • Context Window: 1,048,576 tokens (1M+)
  • Max Output: 65,536 tokens
  • Access: Google AI Studio, Gemini API, Vertex AI
  • Pricing: $0.30 per million input tokens / $2.50 per million output tokens

Capabilities

1M Context at Low Cost: One of the most cost-efficient ways to access 1M token context. At $0.30/$2.50 per million tokens, it significantly undercuts comparable context window options from Anthropic and OpenAI.

Native Multimodality: Processes text, images, video, and audio inputs natively — capable for document analysis, visual Q&A, video understanding, and audio transcription tasks.

Speed: Flash-tier latency — optimized for responsive, high-throughput deployments where time-to-first-token matters.

Thinking Mode: Includes optional thinking mode for tasks that benefit from extended reasoning, allowing developers to trade latency for reasoning depth on a per-request basis.

Limitations

Superseded by Gemini 3 Flash, which offers better benchmark performance at a similar price tier. Max output of 65,536 tokens is lower than some competing models. New projects should strongly consider Gemini 3 Flash instead.

Recent Developments

  • June 17, 2025 Launch: Released as the successor to Gemini 2.0 Flash, with 1M context and thinking mode as headline features.
  • Superseded by Gemini 3 Flash (December 2025): Google now recommends Gemini 3 Flash for new integrations.

Last Updated

February 26, 2026