OpenAI Launches GPT-5.5: Its First Fully Retrained Base Model Since GPT-4.5
April 23, 2026 – 6:26 pm
The model, codenamed “Spud,” is designed to complete complex multi-step tasks with minimal human direction. It sets new benchmarks in agentic coding, computer use, and knowledge work, while matching GPT-5.4’s per-token latency. API access is delayed pending additional safety work.
For months, the AI industry’s open secret has been that Anthropic’s Claude is winning the enterprise market. OpenAI has been in what internal sources described as a “Code Red” state since at least December 2025, watching Anthropic’s ARR sprint from $9 billion to $30 billion while its own B2B positioning eroded. On Thursday, OpenAI responded.
GPT-5.5
The company’s first fully retrained base model since GPT-4.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. The model is designed to complete work with limited human direction, operating across email, spreadsheets, calendars, and other applications.
The core thesis of GPT-5.5 is legibility. Where previous models required carefully structured prompts and multi-step supervision, OpenAI says 5.5 can take a “messy, multi-part task” and independently plan, use tools, check its work, navigate ambiguity, and keep going until the task is finished.
The gains are concentrated in four areas: agentic coding, computer use, knowledge work, and early scientific research. OpenAI describes these as domains “where progress depends on reasoning across context and taking action over time.”
Benchmark Numbers:
- GPT-5.5 reaches 82.7% on Terminal-Bench 2.0, testing complex command-line workflows requiring planning, iteration, and tool coordination.
- On SWE-Bench Pro, it scores 58.6%, solving more tasks in a single pass than previous models.
- On GDPval, it scores 84.9%.
- On OSWorld-Verified, it reaches 78.7%.
- On Tau2-bench Telecom, it reaches 98.0% without prompt tuning.
Across all these benchmarks, OpenAI says GPT-5.5 improves on GPT-5.4’s scores while using fewer tokens.
Efficiency and Cost:
The efficiency claim is commercially significant. Larger, more capable models are typically slower to serve, which creates a cost-quality trade-off for enterprise customers. OpenAI says GPT-5.5 matches GPT-5.4’s per-token latency in real-world serving, meaning it delivers a step up in intelligence without a corresponding increase in response time.
It also uses significantly fewer tokens to complete equivalent tasks in Codex, directly reducing the cost per task for enterprise deployments. GPT-5.5 is priced higher per token than GPT-5.4, but OpenAI says the net effect is better results for lower total cost in most workflows.
Safety Framing:
The safety framing is notably more cautious than previous launches. OpenAI says it will take time to thoroughly test and validate GPT-5.5’s capabilities before expanding API access.