OpenAI/gpt-5.2
OpenAI's flagship reasoning model for professional work and long-running agents, setting new highs in knowledge work, coding, and long-context understanding, offered in Instant, Thinking, and Pro modes for enterprise automation.
More from OpenAI
README
OpenAI/gpt-5.2
Supported Functionality
ItemSpecificationInputText, ImageOutputTextContext400,000 tokensMax Output128,000 tokensVision✓ SupportedFunction Calling✓ Supported
Description
Released by OpenAI on December 11, 2025, GPT-5.2 was described as "the most advanced frontier model for professional work and long-running agents." The launch came just 11 days after Google's Gemini 3 triggered an industry "code red," positioning GPT-5.2 as OpenAI's rapid response to competitive pressure from Gemini 3 and Claude Opus 4.5. GPT-5.2 ships in three configurations: Instant for everyday tasks, Thinking for complex work such as coding and document analysis, and Pro for enterprise-grade, high-difficulty problems and agentic workflows.
GPT-5.2 delivers significant gains in general intelligence, long-context understanding, agentic tool-calling, and vision. On GDPval, a benchmark spanning 44 occupations, GPT-5.2 Thinking beats or ties top industry professionals on 70.9% of comparisons — nearly double GPT-5.1's 38.8% win rate — making it the first model to reach human-expert level on this evaluation. Its hallucination rate also dropped sharply versus GPT-5.1: the error rate on real ChatGPT queries fell from 8.8% to 6.2%, a roughly 30% reduction.
Key Capabilities
Professional knowledge work: Scores 70.9% on GDPval (beating or tying human experts), producing outputs at roughly 11x expert speed and under 1% of the cost. Software engineering & code fixes: Scores 80.0% on SWE-bench Verified and 55.6% on SWE-Bench Pro (a four-language, more contamination-resistant evaluation), with notable gains in front-end development and complex UIs. Abstract reasoning: First model to cross 90% on ARC-AGI-1, and scores 52.9% on ARC-AGI-2, showing stronger abstract pattern recognition. Math & scientific reasoning: Reaches a perfect 100% on AIME 2025, 40.3% on FrontierMath (roughly 10 points above GPT-5.1), and 93.2% on GPQA Diamond. Long-context retrieval: The first model to achieve near-100% accuracy on the 4-needle MRCR variant out to 256K tokens, suited for long reports, contracts, and multi-file projects. Vision understanding: Roughly halves the error rate on chart reasoning and software interface understanding compared to the prior generation, more accurately interpreting dashboards and screenshots. Financial modeling: Scores on an internal investment-banking spreadsheet-modeling eval rose from GPT-5.1's 59.1% to 68.4%, a 9.3-point gain.
Technical Strengths
FeatureBenefitThree flexible modesInstant, Thinking, and Pro cover everything from everyday Q&A to the hardest enterprise-grade tasksSubstantially lower hallucination rateRoughly 30% fewer factual errors than GPT-5.1, improving answer trustworthinessReliable long-context retrievalNear-perfect MRCR long-context performance supports deep document analysis and multi-source synthesisEnterprise-validated agentic capabilityNotion, Box, Shopify, Harvey, and Zoom report state-of-the-art long-horizon reasoning and tool-callingStrong data science & document analysisDatabricks, Hex, and Triple Whale report exceptional performance on agentic data science tasksImproved visual reasoningHalved error rates on charts and interfaces make it well-suited to finance, operations, and engineering workflows
Capability Ratings
DimensionRatingNotesReasoningTop-tierPerfect AIME score, 93.2% on GPQA Diamond, and new highs across the ARC-AGI seriesCodingTop-tier80.0% on SWE-bench Verified and 55.6% on SWE-Bench Pro, with strong front-end and complex-UI gainsCreative WritingStrongOfficial release emphasized professional knowledge work and agentic tasks; creative-writing metrics not prominently disclosedMultimodalExcellentError rates on charts and interfaces roughly halved, indicating markedly stronger visual reasoningResponse SpeedFastThinking mode produces professional-grade output at roughly 11x expert human speedContext WindowMedium400,000 tokens — strong long-context retrieval accuracy, though smaller in raw size than the later GPT-5.4/5.5 line
Use Cases
Enterprise knowledge-work automation: Produces presentations, spreadsheets, and investment-banking models, cutting delivery time and cost significantly. Software engineering & code review: Handles debugging, feature implementation, and large-scale refactors, with particular strength in front-end development. Long-document & contract analysis: Uses near-perfect long-context retrieval to process reports, contracts, and research papers across multi-file projects. Agentic data science: Automates data analysis and document-processing workflows driven by agentic execution. Abstract reasoning & complex problem solving: Tackles high-difficulty problems requiring pattern recognition and multi-step logic. Visual content interpretation: Combines chart, dashboard, and interface-screenshot analysis for decision support. Long-horizon agentic workflows: Supports cross-tool, multi-step automation chains with reduced manual intervention and error rates.
Pricing
| Token Type | LinkAI Price | Official Price |
|---|---|---|
| input | $1.312500 / 1M tokens | $1.750000 / 1M tokens |
| output | $10.500000 / 1M tokens | $14.000000 / 1M tokens |
| reasoning_tokens | $10.500000 / 1M tokens | $14.000000 / 1M tokens |
| cache_read | $0.131250 / 1M tokens | $0.175000 / 1M tokens |