OpenAI/gpt-5.4-mini
OpenAI's most capable mini model yet for coding, computer use, and subagents; supports text and image input, a 400K context window, and extensive tool calling, optimized for low-latency, high-volume production workloads.
More from OpenAI
README
OpenAI/gpt-5.4-mini
Supported Functionality
| Item | Specification |
|---|---|
| Input | Text, Image |
| Output | Text |
| Context | 400,000 tokens |
| Max Output | 128,000 tokens |
| Vision | ✓ Supported (image input only) |
| Function Calling | ✓ Supported |
Description
GPT-5.4 mini was released by OpenAI on March 17, 2026 (snapshot gpt-5.4-mini-2026-03-17) alongside GPT-5.4 nano, both part of the GPT-5.4 family that debuted on March 5, 2026. It carries the core strengths of the flagship GPT-5.4 into a faster, more cost-efficient small model built as the default choice for high-throughput, latency-sensitive workloads, with a knowledge cutoff of August 2025. As a reasoning model, it supports the reasoning_effort parameter across none (default), low, medium, high, and xhigh — the last being a higher reasoning tier unique to this generation.
Compared with the previous GPT-5 mini, GPT-5.4 mini improves significantly across coding, reasoning, multimodal understanding, and tool use while running more than 2x faster, and it approaches the larger GPT-5.4 on evaluations such as SWE-Bench Pro and OSWorld-Verified. OpenAI positions it as its strongest mini model yet, especially well suited to latency-sensitive scenarios like coding assistants, subagent orchestration, and computer use.
Key Capabilities
- Agile Coding: Handles targeted edits, codebase navigation, front-end generation, and debug loops, reaching 54.4% on SWE-Bench Pro (Public, xhigh) while approaching flagship pass rates at low latency.
- Subagent Orchestration: Serves as a fast parallel subagent under a larger planner model (e.g., GPT-5.4) for narrow tasks like codebase search, large-file review, and supporting-document processing.
- Computer Use: Quickly interprets screenshots of dense UIs to complete tasks, scoring 72.1% on OSWorld-Verified (xhigh) — near GPT-5.4 and well ahead of GPT-5 mini.
- Multimodal Understanding: Performs vision reasoning over image inputs, reaching 78.0% on MMMU-Pro (with Python), fit for real-time image-and-text applications.
- Tool Calling: Natively supports web search, file search, code interpreter, hosted shell, apply patch, skills, MCP, and tool search, scoring 93.4% on τ2-bench (telecom).
- Adjustable Reasoning Depth: Five
reasoning_effortlevels (including the unique xhigh) let developers balance accuracy, latency, and cost. - Frontier Knowledge Reasoning: Scores 88.0% on GPQA Diamond (xhigh), retaining strong science and reasoning ability for its price tier.
Technical Strengths
| Feature | Benefit |
|---|---|
| Performance-per-latency | Runs 2x+ faster than GPT-5 mini while approaching GPT-5.4 pass rates, balancing quality and speed |
| Cost efficiency | $0.75 input / $4.50 output per 1M tokens, cached input as low as $0.075, ideal for scale |
| Large context | 400K-token window holds long codebases, documents, and multi-step tool-call traces |
| Subagent-friendly | Acts as a parallel executor under larger models; uses only ~30% of GPT-5.4 quota in Codex |
| Rich tool ecosystem | Native support for 10+ tools and the Responses API enables complex agentic workflows |
| Elastic reasoning tiers | Multiple reasoning_effort levels including xhigh let developers trade accuracy for cost |
Capability Ratings
| Dimension | Rating | Notes |
|---|---|---|
| Reasoning | Strong | 88.0% on GPQA Diamond; excellent for its tier but below flagship GPT-5.4 |
| Coding | Excellent | 54.4% SWE-Bench Pro, 60.0% Terminal-Bench 2.0 — the model's core strength |
| Creative Writing | Strong | Reliable text generation and instruction following for everyday content work |
| Multimodal | Strong | Image input supported; 72.1% OSWorld-Verified, 76.6% MMMU-Pro |
| Response Speed | Very Fast | Over 2x faster than GPT-5 mini, purpose-built for low-latency workloads |
| Context Window | Large | 400K tokens, larger than most peers but smaller than flagship GPT-5.4's 1M+ window |
Use Cases
- Coding Assistants: Low-latency edits, debugging, and codebase navigation for IDEs and copilots that need instant feedback.
- Subagent Systems: Parallel handling of retrieval, review, and document-parsing subtasks under a larger orchestrator to cut overall cost.
- Computer-Use Agents: Interprets UI screenshots to drive automation, powering GUI-automation and RPA-style applications.
- Real-Time Multimodal Apps: Reasons over images in real time for visual Q&A, document understanding, and image-text analysis.
- High-Concurrency Chat: Stable instruction following and multi-step reasoning for chat and support at scale.
- Data Extraction & Classification: Combines tool calls and structured output for RAG, information extraction, and pipeline tasks.
- Cost-Sensitive Production: Serves large-volume API traffic at a lower price, balancing capability, latency, and budget.
Pricing
| Token Type | LinkAI Price | Official Price |
|---|---|---|
| input | $0.562500 / 1M tokens | $0.750000 / 1M tokens |
| output | $3.375000 / 1M tokens | $4.500000 / 1M tokens |
| cache_read | $0.056250 / 1M tokens | $0.075000 / 1M tokens |