GPT-5.3 Codex Released: Full Benchmark Results and What’s New

GPT-5.3 Codex Released: Full Benchmark Results and What’s New

OpenAI has released GPT-5.3 Codex, a new coding model designed to handle longer, more complex work across the full software lifecycle. It builds on GPT-5.2 Codex and GPT-5.2, combining stronger coding performance with deeper reasoning and professional knowledge. OpenAI says the model also runs about 25 percent faster, which matters for long tasks that involve research, tools, and repeated execution.

At its core, GPT-5.3 Codex shifts Codex from a code-writing assistant to a general computer-using agent. You can guide it while it works, ask questions mid-task, and change direction without losing context. OpenAI also revealed that early versions of the model helped debug its own training and deployment, speeding up internal development in ways the team did not expect.

Benchmark results

GPT-5.3 Codex sets new highs across several key evaluations used to measure real-world coding and agentic ability.

  • SWE-Bench Pro (Public): 56.8% accuracy
    This benchmark covers four programming languages and focuses on real software engineering tasks. GPT-5.3 Codex leads previous models while using fewer output tokens.
  • Terminal-Bench 2.0: 77.3% accuracy
    This measures how well an agent uses the command line. The gap over GPT-5.2 Codex is significant, showing stronger practical developer skills.
  • OSWorld-Verified: 64.7% accuracy
    Here, models complete visual desktop tasks. Human performance sits at around 72%, putting GPT-5.3 Codex close to human-level performance on computers.
  • GDPval: 70.9% wins or ties
    This evaluates professional knowledge work across 44 jobs, including slides, spreadsheets, and reports. GPT-5.3 Codex matches the strongest prior results.
  • Cybersecurity CTF challenges: 77.6%
    This reflects improved vulnerability detection, prompting OpenAI to classify the model as high capability for cybersecurity tasks.

Beyond benchmarks, GPT-5.3 Codex shows clear gains in real use. It can build and iterate on full web apps and games over days, handle debugging, deployment, monitoring, and even non-code tasks like documentation and analysis. OpenAI has paired these capabilities with stricter cybersecurity safeguards and limited access controls.

GPT-5.3 Codex is now available to paid ChatGPT users across the Codex app, CLI, IDE extensions, and web, with API access planned next.

OpenAI frames GPT-5.3 Codex as more than a benchmark bump. It represents a shift in how Codex behaves during long, complex work, especially when tasks require planning, tool use, and steady progress without constant supervision.

Interactive work style inside Codex

GPT-5.3 Codex provides more frequent progress updates while it runs, helping users track key decisions and intervene earlier. Instead of waiting for a final result, you can interact during execution, ask questions, discuss trade-offs, and steer the work without losing context.

  • Steering setting: Enable steering while the model works in the app under Settings > General > Follow-up behavior.

Stronger default web output

For everyday web tasks, GPT-5.3 Codex tends to produce more complete results from simple or underspecified prompts. It defaults to sensible layouts, clearer pricing logic, and more finished components, giving developers a stronger starting point instead of a minimal scaffold.

A model that helped build itself

OpenAI describes GPT-5.3 Codex as the first model that materially assisted in its own development. Early versions were used to debug training runs, manage deployment, and diagnose evaluation results.

Engineers also relied on Codex to identify context-rendering bugs, investigate low cache-hit rates, and dynamically scale GPU clusters during traffic spikes to keep latency stable.

Expanded cybersecurity focus

GPT-5.3 Codex is the first OpenAI model classified as high capability for cybersecurity tasks. It was directly trained to identify software vulnerabilities, prompting the rollout of stronger safeguards and monitoring systems.

  • Trusted Access for Cyber, a pilot program focused on defensive research
  • An expanded private beta of Aardvark, OpenAI’s security research agent
  • Free vulnerability scanning for major open-source projects
  • $10 million in API credits dedicated to cybersecurity defense work

Availability and infrastructure

GPT-5.3 Codex is available on paid ChatGPT plans wherever Codex runs, including the app, CLI, IDE extensions, and web. API access is not live yet, but OpenAI says it is coming.

The 25 percent speed improvement comes from infrastructure and inference stack upgrades. OpenAI also confirms the model was trained and served on NVIDIA GB200 NVL72 systems.

All benchmark evaluations referenced for GPT-5.3 Codex were run using xhigh reasoning effort, which is important when comparing results across models.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.