gpt-5.4-mini - LinkModel

Supported Functionality

Item	Specification
Input	Text, Image
Output	Text
Context	400,000 tokens
Max Output	128,000 tokens
Vision	✓ Supported (image input only)
Function Calling	✓ Supported

Description

GPT-5.4 mini was released by OpenAI on March 17, 2026 (snapshot gpt-5.4-mini-2026-03-17) alongside GPT-5.4 nano, both part of the GPT-5.4 family that debuted on March 5, 2026. It carries the core strengths of the flagship GPT-5.4 into a faster, more cost-efficient small model built as the default choice for high-throughput, latency-sensitive workloads, with a knowledge cutoff of August 2025. As a reasoning model, it supports the reasoning_effort parameter across none (default), low, medium, high, and xhigh — the last being a higher reasoning tier unique to this generation.

Compared with the previous GPT-5 mini, GPT-5.4 mini improves significantly across coding, reasoning, multimodal understanding, and tool use while running more than 2x faster, and it approaches the larger GPT-5.4 on evaluations such as SWE-Bench Pro and OSWorld-Verified. OpenAI positions it as its strongest mini model yet, especially well suited to latency-sensitive scenarios like coding assistants, subagent orchestration, and computer use.

Key Capabilities

Agile Coding: Handles targeted edits, codebase navigation, front-end generation, and debug loops, reaching 54.4% on SWE-Bench Pro (Public, xhigh) while approaching flagship pass rates at low latency.
Subagent Orchestration: Serves as a fast parallel subagent under a larger planner model (e.g., GPT-5.4) for narrow tasks like codebase search, large-file review, and supporting-document processing.
Computer Use: Quickly interprets screenshots of dense UIs to complete tasks, scoring 72.1% on OSWorld-Verified (xhigh) — near GPT-5.4 and well ahead of GPT-5 mini.
Multimodal Understanding: Performs vision reasoning over image inputs, reaching 78.0% on MMMU-Pro (with Python), fit for real-time image-and-text applications.
Tool Calling: Natively supports web search, file search, code interpreter, hosted shell, apply patch, skills, MCP, and tool search, scoring 93.4% on τ2-bench (telecom).
Adjustable Reasoning Depth: Five reasoning_effort levels (including the unique xhigh) let developers balance accuracy, latency, and cost.
Frontier Knowledge Reasoning: Scores 88.0% on GPQA Diamond (xhigh), retaining strong science and reasoning ability for its price tier.

Technical Strengths

Feature	Benefit
Performance-per-latency	Runs 2x+ faster than GPT-5 mini while approaching GPT-5.4 pass rates, balancing quality and speed
Cost efficiency	$0.75 input / $4.50 output per 1M tokens, cached input as low as $0.075, ideal for scale
Large context	400K-token window holds long codebases, documents, and multi-step tool-call traces
Subagent-friendly	Acts as a parallel executor under larger models; uses only ~30% of GPT-5.4 quota in Codex
Rich tool ecosystem	Native support for 10+ tools and the Responses API enables complex agentic workflows
Elastic reasoning tiers	Multiple `reasoning_effort` levels including xhigh let developers trade accuracy for cost

Capability Ratings

Dimension	Rating	Notes
Reasoning	Strong	88.0% on GPQA Diamond; excellent for its tier but below flagship GPT-5.4
Coding	Excellent	54.4% SWE-Bench Pro, 60.0% Terminal-Bench 2.0 — the model's core strength
Creative Writing	Strong	Reliable text generation and instruction following for everyday content work
Multimodal	Strong	Image input supported; 72.1% OSWorld-Verified, 76.6% MMMU-Pro
Response Speed	Very Fast	Over 2x faster than GPT-5 mini, purpose-built for low-latency workloads
Context Window	Large	400K tokens, larger than most peers but smaller than flagship GPT-5.4's 1M+ window

Use Cases

Coding Assistants: Low-latency edits, debugging, and codebase navigation for IDEs and copilots that need instant feedback.
Subagent Systems: Parallel handling of retrieval, review, and document-parsing subtasks under a larger orchestrator to cut overall cost.
Computer-Use Agents: Interprets UI screenshots to drive automation, powering GUI-automation and RPA-style applications.
Real-Time Multimodal Apps: Reasons over images in real time for visual Q&A, document understanding, and image-text analysis.
High-Concurrency Chat: Stable instruction following and multi-step reasoning for chat and support at scale.
Data Extraction & Classification: Combines tool calls and structured output for RAG, information extraction, and pipeline tasks.
Cost-Sensitive Production: Serves large-volume API traffic at a lower price, balancing capability, latency, and budget.

Token Type	LinkAI Price	Official Price
input	$0.562500 / 1M tokens	$0.750000 / 1M tokens
output	$3.375000 / 1M tokens	$4.500000 / 1M tokens
cache_read	$0.056250 / 1M tokens	$0.075000 / 1M tokens

OpenAI/gpt-5.4-mini

More from OpenAI

README