GPT-Realtime-2

Summary

GPT-Realtime-2 is OpenAI's first realtime voice model with GPT-5-class reasoning, released May 7, 2026 in the OpenAI Realtime API.

Overview

GPT-Realtime-2 is OpenAI's first realtime voice model with GPT-5-class reasoning, released May 7, 2026 in the OpenAI Realtime API. It is a major upgrade to the prior GPT-Realtime line, designed to handle harder requests and carry the conversation forward naturally — calling tools, handling corrections or interruptions, and responding in a way that fits the moment.

The model headlines OpenAI's "advancing voice intelligence" launch alongside [[GPT-Realtime-Translate]] (live speech translation) and GPT-Realtime-Whisper (low-latency transcription). The product framing: voice is now the surface where OpenAI competes to convert frontier reasoning into real-world agentic workflows.

Specifications

Developer: OpenAI
Model String: gpt-realtime-2 (Realtime API)
Release Date: May 7, 2026
Type: Realtime audio (voice-in / voice-out) reasoning model with tool use
Underlying Reasoning Class: GPT-5-class
Context Window: 128K tokens (expanded from 32K in the prior GPT-Realtime model)
Distribution: OpenAI Realtime API; also available via Microsoft Azure AI Foundry
Pricing: $32 / 1M audio input tokens (cached input: $0.40 / 1M) and $64 / 1M audio output tokens

Capabilities

GPT-5-Class Reasoning In-Voice: First voice model to inherit reasoning capability from the GPT-5 family, enabling multi-step tool use, plan adjustment, and live correction within a continuous voice conversation.

128K Context Window: 4× the prior generation's context, supporting longer conversations and more complex multi-turn workflows without losing earlier state.

Continuous Conversation: The model keeps the conversation moving while it reasons through a request, calls tools, handles corrections or interruptions, and produces responses that fit the conversational moment — closing the gap between "voice assistant" and "voice-native agent."

Tool Use During Voice Turn: Supports tool calls inline with speech generation, enabling agentic actions (lookups, transactions, MCP-style integrations) without breaking conversational flow.

Limitations

Realtime audio pricing is materially higher than text — $32/$64 per million tokens makes long sessions expensive at scale.
Audio-token economics still favor short, structured voice tasks; production deployments often pair GPT-Realtime-2 with GPT-Realtime-Whisper for transcription and standard text models for back-office workflow.
As a Realtime API surface, deployment requires OpenAI Realtime SDK integration; existing Chat Completions integrations do not transfer without rework.

Recent Developments

May 7, 2026: GPT-Realtime-2 launched alongside GPT-Realtime-Translate and GPT-Realtime-Whisper as OpenAI's "Advancing voice intelligence with new models in the API" release. Microsoft Azure AI Foundry concurrently announced availability.
Industry Context: Lands as voice-native AI becomes a primary differentiator — competing surfaces include Anthropic's voice work, Google's Gemini Live, and a wave of voice-first agent startups (Vapi, Retell, Bland, etc.) whose unit economics depend on Realtime API pricing.

Last Updated

May 11, 2026

→ Back to Models