GPT-Realtime-Translate

Summary

GPT-Realtime-Translate is OpenAI's purpose-built live speech translation model in the Realtime API, released May 7, 2026 alongside GPT-Realtime-2 and GPT-Realtime-Whisper.

Overview

GPT-Realtime-Translate is OpenAI's purpose-built live speech translation model in the Realtime API, released May 7, 2026 alongside GPT-Realtime-2 and GPT-Realtime-Whisper. It translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker, intentionally constrained to the translation task — it will not respond conversationally and will not summarize.

The model was trained on thousands of hours of professional interpreter audio, which helps it (a) remain translation-only and (b) wait for enough context before producing speech, mimicking the rhythm of human simultaneous interpretation.

Specifications

  • Developer: OpenAI
  • Model String: gpt-realtime-translate (Realtime API)
  • Release Date: May 7, 2026
  • Type: Specialized speech-to-speech translation model
  • Input Languages: 70+
  • Output Languages: 13
  • Distribution: OpenAI Realtime API
  • Pricing: $0.034 per minute

Capabilities

Live Simultaneous Interpretation Pace: Trained on professional human interpreter audio so it waits for enough source-language context before speaking — closer to human conference interpretation than to lagged dub-style translation.

Translation-Only Behavior: Hard-constrained to translation. It will not respond conversationally to the speaker, summarize, or volunteer information. This is deliberate — production live-translation workflows require deterministic behavior.

Per-Minute Pricing: At $0.034/minute, OpenAI is pricing aggressively against existing dedicated speech translation services (Deepgram, AssemblyAI, Microsoft, Google) — particularly for bidirectional or multi-party meetings where prior solutions ran on per-character or per-second tariffs.

Limitations

  • 13 output languages is far narrower than the 70+ input languages — meaning translation is fundamentally asymmetric and English-centric. Many bidirectional pairs simply aren't supported.
  • Specialized model: cannot fall back to general reasoning when translation context is ambiguous (jargon, domain-specific idiom). Production deployments may need a fallback to GPT-Realtime-2 for clarification.
  • Per-minute billing assumes continuous voice usage; bursty or long-pause sessions may incur charges out of proportion to actual translation volume.

Recent Developments

  • May 7, 2026: Released as part of OpenAI's "Advancing voice intelligence with new models in the API" launch. Cookbook published demonstrating bidirectional live-translation app construction.
  • Industry Context: Lands as enterprise live-translation moves from offline/post-event to inline/realtime, particularly in customer support, healthcare, courtroom, and field operations contexts. Competing services include Deepgram Nova, AssemblyAI Universal, Microsoft Azure Speech Translation, and Google Cloud Speech-to-Speech.

Last Updated

May 11, 2026