OpenAI/gpt-5.3-codex
OpenAI's agentic coding model merging prior Codex engineering strength with GPT-5.2's reasoning, supporting real-time mid-task steering, setting new highs on terminal and computer-use benchmarks for long-horizon autonomous development work.
More from OpenAI
README
OpenAI/gpt-5.3-codex
Supported Functionality
ItemSpecificationInputText, ImageOutputTextContext400,000 tokensMax Output128,000 tokensVision✓ SupportedFunction Calling✓ Supported (tool use, terminal operations, computer use, etc.)
Description
Released by OpenAI on February 5, 2026, GPT-5.3-Codex replaces both GPT-5.2 and GPT-5.2-Codex, merging the strengths of the two — frontier software engineering capability and general reasoning plus professional knowledge — into a single model. It launched across all Codex surfaces (app, CLI, IDE extension, web) for paid ChatGPT users, with API access following in the weeks after. Its knowledge cutoff is August 2025.
The model's core design goal is supporting long-horizon, tool-intensive work: it continuously reports progress and accepts real-time feedback and redirection during execution (steerability) without losing context, feeling closer to collaborating with a colleague than submitting a prompt and waiting for a final output. Notably, OpenAI disclosed that GPT-5.3-Codex was the first model instrumental in creating itself — early versions were used by the Codex team to debug its own training, manage its own deployment, and diagnose test results, substantially accelerating its own development.
Key Capabilities
Frontier agentic coding: Scores 56.8% on SWE-Bench Pro (up slightly from GPT-5.2-Codex's 56.4%), using fewer output tokens than any prior model to reach that result. Terminal & command-line operations: Scores 77.3% on Terminal-Bench 2.0, a large gain over GPT-5.2-Codex, excelling at multi-step shell workflows and system administration. Computer-use capability: Reaches 64.7% on OSWorld-Verified, a substantial jump from the prior generation, enabling real graphical-interface operations. Professional knowledge work: Using custom skills similar to those behind its GDPval results, performs on par with GPT-5.2 on professional knowledge tasks such as presentations and spreadsheets. Real-time mid-task steering: Accepts user feedback and direction changes during execution without restarting or losing completed context. Cybersecurity capability: The first model classified as "High" cybersecurity capability under OpenAI's Preparedness Framework, with vulnerability-identification ability and additional deployed safeguards. Adjustable reasoning depth: Supports low, medium, high, and xhigh reasoning effort settings, balancing speed and depth by task difficulty.
Technical Strengths
FeatureBenefit25% faster inferenceInfrastructure and inference-stack optimizations deliver a meaningful speed improvement over prior Codex modelsHigher token efficiencyReaches equal or better SWE-Bench Pro results with fewer output tokens than any prior model, lowering usage costCombines two capability setsA single model provides both GPT-5.2-Codex's coding strength and GPT-5.2's general reasoning and knowledge-work ability, no model switching requiredStable long-horizon executionImproves failure patterns in multi-file, multi-step tasks, reducing unstable patch loops and premature "done" statesFirst high-cybersecurity-capability modelClassified "High" under the Preparedness Framework, paired with OpenAI's most comprehensive safety stack to dateTransparent deep diagnosticsProvides "deep diffs" for reasoning transparency, easing human review and debugging
Capability Ratings
DimensionRatingNotesReasoningStrongScores 44 on the Artificial Analysis Intelligence Index, above average for comparable models but not purpose-built for general reasoningCodingTop-tierLeads on SWE-Bench Pro, Terminal-Bench 2.0, and SWE-Lancer IC DiamondCreative WritingModeratePositioned around coding and professional knowledge work rather than creative writingMultimodalStrongSupports text and image input, useful for screenshot-assisted debugging and interface understandingResponse SpeedFastAbout 25% faster inference than prior Codex models, suited to high-frequency iterationContext WindowMedium400,000 tokens — smaller than the million-token windows of the contemporaneous GPT-5.4 line, but sufficient for most engineering tasks
Use Cases
Long-horizon autonomous software development: Independently handles requirements analysis, coding, testing, and debugging across multi-file, cross-module work. Terminal & systems administration: Executes complex multi-step shell workflows, scripting, and operational tasks. Interactive pair programming: Real-time steering lets developers inject feedback and redirect the model mid-task, closer to human collaboration. Vulnerability identification & security research: Leverages its "High" cybersecurity classification to help identify and fix software vulnerabilities, within appropriate compliance frameworks. Professional knowledge-work output: Produces presentations, spreadsheets, and other office documents, handling cross-domain operational research. Computer-use automation: Performs installation, configuration, and testing through real graphical-interface actions, reducing manual intervention. Web & app prototyping: For simple or underspecified prompts, defaults to more functional, sensibly designed websites and app prototypes.
Pricing
| Token Type | LinkAI Price | Official Price |
|---|---|---|
| input | $1.312500 / 1M tokens | $1.750000 / 1M tokens |
| output | $10.500000 / 1M tokens | $14.000000 / 1M tokens |
| reasoning_tokens | $10.500000 / 1M tokens | $14.000000 / 1M tokens |
| cache_read | $0.131250 / 1M tokens | $0.175000 / 1M tokens |