OpenAI

OpenAI/gpt-5.4

1.1M contextFrom $0.1875 / 1M tokens

OpenAI's unified flagship merging Codex and GPT, the first mainline reasoning model with frontier coding capability, featuring a million-token context and native computer use, leading knowledge-work and coding benchmarks for agentic automation.

Chat

More from OpenAI

README

OpenAI/gpt-5.4

Supported Functionality

ItemSpecificationInputText, ImageOutputTextContext1,050,000 tokens (approx. 922,000 input + 128,000 output)Max Output128,000 tokensVision✓ SupportedFunction Calling✓ Supported (tool search, code interpreter, hosted shell, native computer use, MCP, and the full modern agent stack)

Description

Released by OpenAI on March 5, 2026, GPT-5.4 is the next-generation mainline reasoning model for complex professional work. Unlike earlier releases split by use case, GPT-5.4 unifies the Codex and GPT product lines into a single architecture — it is the first mainline reasoning model to absorb GPT-5.3-Codex's frontier coding capability, giving it top-tier strength simultaneously in coding, reasoning, computer use, and document processing rather than excelling in just one domain. Its knowledge cutoff is August 2025.

GPT-5.4 ships in multiple tiers — Standard, Thinking, Pro, Mini, and Nano — spanning everything from budget-constrained prototyping to premium enterprise deployment. OpenAI's release materials repeatedly frame the model around real office work — spreadsheets, presentations, documents, legal analysis, and research — rather than centering purely on coding or abstract reasoning benchmarks, reflecting a product positioning around real professional work.

Key Capabilities

Knowledge-work automation: Scores 83.0% on GDPval (up sharply from GPT-5.2's 70.9%) and 87.3% on an internal investment-banking modeling eval (up from 68.4% for GPT-5.2). Frontier agentic coding: Scores 57.7% on SWE-Bench Pro, slightly ahead of GPT-5.3-Codex's 56.8% — the first mainline model to blend top-tier coding with general reasoning. Native computer use: Scores 75.0% on OSWorld-Verified, the first model to surpass the human-expert baseline (72.4%) OpenAI cites, directly controlling desktop apps, filling forms, and navigating browsers. Massive long-context processing: A 1.05M-token context window can analyze an entire codebase or large document collection in one pass, with output up to 128,000 tokens. Full agentic tool stack: Supports web search, file search, image generation, code interpreter, hosted shell, apply patch, skills, computer use, MCP, and tool search — the complete modern agent stack. Tool search: Retrieves tool definitions on demand across large tool ecosystems, cutting prompt token overhead for large-scale tool integrations. Adjustable reasoning depth: Reasoning effort supports none (default), low, medium, high, and xhigh, balancing speed and accuracy by task complexity.

Technical Strengths

FeatureBenefitUnifies Codex and GPT linesA single model provides top-tier coding, reasoning, and office-task handling without switching between specialized modelsFirst past the human computer-use baseline75.0% on OSWorld-Verified surpasses the 72.4% human-expert baseline, the first model to cross that thresholdMillion-token context windowProcesses an entire codebase, legal contract, or hundreds of pages of documents in a single pass, eliminating manual chunkingOptimized for real office workRelease materials focus on spreadsheets, presentations, and legal analysis rather than a single abstract benchmarkMultiple tiers for every scenarioStandard, Thinking, Pro, Mini, and Nano span from edge deployment to premium enterprise useElevated cybersecurity postureClassified as "High" cybersecurity capability under the Preparedness Framework, with monitoring, trusted access controls, and asynchronous blocking

Capability Ratings

DimensionRatingNotesReasoningExcellentAbsorbs prior Codex reasoning strength while continuing the strong reasoning of the GPT-5.2 line, across professional and abstract problemsCodingTop-tier57.7% on SWE-Bench Pro, the first mainline reasoning model to integrate top-tier coding capabilityCreative WritingStrongOfficial positioning centers on professional office work and agentic tasks; creative-writing metrics not prominently disclosedMultimodalStrongSupports text and image input, enabling combined analysis of screenshots and documentsResponse SpeedMediumSomewhat higher latency than dedicated Codex models, but reasoning effort is adjustable to trade off speedContext WindowHuge1.05M tokens, among the largest context windows of any OpenAI mainline model at the time

Use Cases

All-round professional office work: Produces spreadsheets, presentations, and legal analysis documents, covering the knowledge-work tasks measured across GDPval's 44 occupations. Enterprise agentic automation: Builds cross-application, cross-tool automation workflows using the full agent stack (shell, MCP, code interpreter, etc.). Large-scale codebase analysis: Loads an entire repository in one pass for code review, refactoring, migration, and vulnerability discovery. Computer-use automation: Directly operates desktop software through native computer use to perform installation, configuration, and testing tasks. Financial & investment-banking modeling: Builds three-statement models, leveraged buyout models, and other financial work products with proper formatting and citations. Long-document & legal analysis: Processes complete contracts, regulatory filings, and research reports to identify key clauses and risks. From prototype to enterprise deployment: Lightweight Mini and Nano tiers support budget-constrained scenarios, while Pro serves the most demanding enterprise needs.

Pricing

Tiered by input prompt tokens (incl. cache): once over the threshold, the whole request is billed at the higher tier.

Token Type
Short context
≤271987
Long context
>271987
input$1.875$2.5-25%$7.5$10-25%
output$11.25$15-25%$33.75$45-25%
reasoning_tokens$10.5$14-25%$10.5$14-25%
cache_read$0.1875$0.25-25%$0.75$1-25%