GPT-5.5 is OpenAI's flagship agentic model, released April 24, 2026, with a 1-million-token context window — a 4x expansion over GPT-5.4. It is the first model to cross the OSWorld-V agentic-desktop human baseline at 75% (vs. 72.4% human).
GPT-5.5 is OpenAI's flagship model, released April 24, 2026, and the model that — in industry coverage — is widely cited as the moment frontier LLMs crossed credibly from "chat tools" to "autonomous digital coworkers." The signature result: 75% on the OSWorld-V agentic-desktop benchmark, modestly above the cited 72.4% human baseline, on a benchmark that requires the model to autonomously execute multi-step workflows across desktop applications. Combined with a 1-million-token context window — a 4x expansion over GPT-5.4 — GPT-5.5 became OpenAI's bid to dominate the agentic productivity workflow space.
In early May 2026, OpenAI promoted GPT-5.5 Instant as the new default ChatGPT model, putting the new long-context, agentic-workflow capabilities in front of free and Plus tiers. A higher-capability variant, GPT-5.5 Pro, is also available at substantially higher pricing for the most demanding workloads.
1M Token Context (headline feature): ~750,000 words — equivalent to 6–8 full-length novels or codebases of tens of thousands of lines — held in a single context window.
Long-Context Retrieval: At 512K–1M tokens, GPT-5.5 scores 74.0% on long-context retrieval evaluations vs. 36.6% for GPT-5.4 — more than double, indicating that long-context performance is genuine rather than nominal.
Autonomous Desktop Workflows: Crosses the OSWorld-V human baseline at 75% (vs. 72.4% human), representing autonomous multi-step execution across desktop software — file management, application use, structured data manipulation. This is the result that drives the "frontier models are now digital coworkers" framing.
Agentic Reasoning: OpenAI describes GPT-5.5 as its smartest and most intuitive model — one that understands what a user is trying to do faster and carries more of the work itself rather than waiting for step-by-step instructions.
Multimodal: Full multimodal support — text, image, and structured input.
At GPT-5.5 Pro's $30/$180 per million input/output tokens, the highest-capability variant is one of the most expensive frontier models in the market. The 1M context window is real but expensive to use at scale, and OpenAI's reported revisions to compute spending plans (cut from $1.4T to $600B through 2030) reflect the operational reality of serving long-context inference economically. OSWorld-V is a useful benchmark but not a full proxy for real-world desktop automation reliability — production agent deployments still require careful orchestration, human-in-the-loop fallbacks, and domain-specific tuning. Cybersecurity safeguards and refusal behaviors are evolving in response to the broader regulatory environment around frontier-model pre-release review.
May 7, 2026