OpenAI

OpenAI/gpt-5.4-mini

400K contextFrom $0.05625 / 1M tokens

OpenAI's most capable mini model yet for coding, computer use, and subagents; supports text and image input, a 400K context window, and extensive tool calling, optimized for low-latency, high-volume production workloads.

Chat

More from OpenAI

README

OpenAI/gpt-5.4-mini

Supported Functionality

ItemSpecification
InputText, Image
OutputText
Context400,000 tokens
Max Output128,000 tokens
Vision✓ Supported (image input only)
Function Calling✓ Supported

Description

GPT-5.4 mini was released by OpenAI on March 17, 2026 (snapshot gpt-5.4-mini-2026-03-17) alongside GPT-5.4 nano, both part of the GPT-5.4 family that debuted on March 5, 2026. It carries the core strengths of the flagship GPT-5.4 into a faster, more cost-efficient small model built as the default choice for high-throughput, latency-sensitive workloads, with a knowledge cutoff of August 2025. As a reasoning model, it supports the reasoning_effort parameter across none (default), low, medium, high, and xhigh — the last being a higher reasoning tier unique to this generation.

Compared with the previous GPT-5 mini, GPT-5.4 mini improves significantly across coding, reasoning, multimodal understanding, and tool use while running more than 2x faster, and it approaches the larger GPT-5.4 on evaluations such as SWE-Bench Pro and OSWorld-Verified. OpenAI positions it as its strongest mini model yet, especially well suited to latency-sensitive scenarios like coding assistants, subagent orchestration, and computer use.

Key Capabilities

  • Agile Coding: Handles targeted edits, codebase navigation, front-end generation, and debug loops, reaching 54.4% on SWE-Bench Pro (Public, xhigh) while approaching flagship pass rates at low latency.
  • Subagent Orchestration: Serves as a fast parallel subagent under a larger planner model (e.g., GPT-5.4) for narrow tasks like codebase search, large-file review, and supporting-document processing.
  • Computer Use: Quickly interprets screenshots of dense UIs to complete tasks, scoring 72.1% on OSWorld-Verified (xhigh) — near GPT-5.4 and well ahead of GPT-5 mini.
  • Multimodal Understanding: Performs vision reasoning over image inputs, reaching 78.0% on MMMU-Pro (with Python), fit for real-time image-and-text applications.
  • Tool Calling: Natively supports web search, file search, code interpreter, hosted shell, apply patch, skills, MCP, and tool search, scoring 93.4% on τ2-bench (telecom).
  • Adjustable Reasoning Depth: Five reasoning_effort levels (including the unique xhigh) let developers balance accuracy, latency, and cost.
  • Frontier Knowledge Reasoning: Scores 88.0% on GPQA Diamond (xhigh), retaining strong science and reasoning ability for its price tier.

Technical Strengths

FeatureBenefit
Performance-per-latencyRuns 2x+ faster than GPT-5 mini while approaching GPT-5.4 pass rates, balancing quality and speed
Cost efficiency$0.75 input / $4.50 output per 1M tokens, cached input as low as $0.075, ideal for scale
Large context400K-token window holds long codebases, documents, and multi-step tool-call traces
Subagent-friendlyActs as a parallel executor under larger models; uses only ~30% of GPT-5.4 quota in Codex
Rich tool ecosystemNative support for 10+ tools and the Responses API enables complex agentic workflows
Elastic reasoning tiersMultiple reasoning_effort levels including xhigh let developers trade accuracy for cost

Capability Ratings

DimensionRatingNotes
ReasoningStrong88.0% on GPQA Diamond; excellent for its tier but below flagship GPT-5.4
CodingExcellent54.4% SWE-Bench Pro, 60.0% Terminal-Bench 2.0 — the model's core strength
Creative WritingStrongReliable text generation and instruction following for everyday content work
MultimodalStrongImage input supported; 72.1% OSWorld-Verified, 76.6% MMMU-Pro
Response SpeedVery FastOver 2x faster than GPT-5 mini, purpose-built for low-latency workloads
Context WindowLarge400K tokens, larger than most peers but smaller than flagship GPT-5.4's 1M+ window

Use Cases

  • Coding Assistants: Low-latency edits, debugging, and codebase navigation for IDEs and copilots that need instant feedback.
  • Subagent Systems: Parallel handling of retrieval, review, and document-parsing subtasks under a larger orchestrator to cut overall cost.
  • Computer-Use Agents: Interprets UI screenshots to drive automation, powering GUI-automation and RPA-style applications.
  • Real-Time Multimodal Apps: Reasons over images in real time for visual Q&A, document understanding, and image-text analysis.
  • High-Concurrency Chat: Stable instruction following and multi-step reasoning for chat and support at scale.
  • Data Extraction & Classification: Combines tool calls and structured output for RAG, information extraction, and pipeline tasks.
  • Cost-Sensitive Production: Serves large-volume API traffic at a lower price, balancing capability, latency, and budget.

Pricing

Token TypeLinkAI PriceOfficial Price
input$0.562500 / 1M tokens$0.750000 / 1M tokens
output$3.375000 / 1M tokens$4.500000 / 1M tokens
cache_read$0.056250 / 1M tokens$0.075000 / 1M tokens