Gemini 2.5 Flash-Lite

Summary

Gemini 2.5 Flash-Lite is Google DeepMind's lowest-cost, lowest-latency model, released September 25, 2025. At $0.10 input / $0.40 output per million tokens with a 1M+ context window, it targets high-volume, latency-sensitive multimodal workloads.

Overview

Gemini 2.5 Flash-Lite is Google DeepMind's lowest-cost, lowest-latency model in the 2.5 family, released September 25, 2025. At $0.10 per million input tokens and $0.40 per million output tokens, it is one of the cheapest frontier-adjacent models available from any major lab — nearly 50x cheaper per input token than Anthropic's Claude Opus 4.6, and 17x cheaper than Claude Haiku 4.5. It is positioned as a cost-effective upgrade from the older Gemini 1.5 and 2.0 Flash models for high-volume, latency-sensitive applications where cost is the primary constraint.

Flash-Lite is the right choice for applications that need to process enormous volumes of requests — content classification, bulk document processing, real-time chat at massive scale, or sub-agent tasks in large multi-agent pipelines where the individual subtask doesn't warrant a stronger model.

Specifications

  • Developer: Google DeepMind
  • Model String: gemini-2.5-flash-lite (check Google AI docs for versioned string)
  • Release Date: September 25, 2025
  • Type: Natively Multimodal LLM (text, images, video)
  • Context Window: 1,048,576 tokens (1M+)
  • Access: Google AI Studio, Gemini API, Vertex AI
  • Pricing: $0.10 per million input tokens / $0.40 per million output tokens

Capabilities

Exceptional Cost Efficiency: At $0.10/$0.40 per million tokens with a 1M context window, Flash-Lite offers a combination of context size and price that is unmatched in the current market from major labs.

Low Latency: Designed for the fastest possible response times — suitable for real-time, interactive applications and high-throughput pipelines.

1M Token Context: Despite being the cheapest model in the lineup, Flash-Lite retains the full 1M token context window of its siblings — making it practical for large-document tasks at minimal cost.

Multimodal: Handles text, image, and video inputs at the same price point.

Limitations

As the most cost-efficient model, Flash-Lite trades capability depth for price and speed. For complex reasoning, nuanced writing, or tasks requiring sustained analytical quality, Gemini 3 Flash or Gemini 3 Pro will produce better results. It is not a reasoning model. Benchmark performance is below the Gemini 3 series on all major evaluations.

Recent Developments

  • September 25, 2025 Launch: Introduced in preview as the lowest-cost option in the 2.5 family, targeting teams migrating from older Gemini 1.5 Flash models.
  • Pricing Leader: At $0.10/$0.40 per million tokens, it set a new low for 1M context pricing among major AI labs at launch.

Last Updated

February 26, 2026