Gemini

Google/gemini-3.5-flash

1.0M contextFrom $0.0000 / 1M tokens

A frontier-level ultra-fast native five-modal large model developed by Google DeepMind, equipped with dynamic multi-level deep thinking, stable agent orchestration, flagship coding capability and million-token long context, delivering frontier intelligence at half the cost, suitable for enterprise-scale agent clusters, full-stack R&D, bulk multimedia processing, massive document governance and high-concurrency commercial API scenarios.

Chat

More from Google

README

Google/gemini-3.5-flash

Supported Functionality 表格 Item Specification Input Text, Image, Audio, Video, PDF Output Text Context 1,048,576 tokens Max Output 65,536 tokens Vision ✓ Supported (High-Res Image, Long Video, Scanned PDF & Technical Chart Analysis) Function Calling ✓ Supported (Batch Serial/Parallel Calling, Sandbox Code Execution, Web Grounding, Constrained Structured JSON, Prompt Caching) Description Paragraph 1: Officially released at Google I/O 2026 on May 19, Gemini 3.5 Flash is the flagship commercial model of the Gemini 3.5 family built on optimized sparse MoE architecture, with knowledge cutoff in January 2026. It outperforms Gemini 3.1 Pro on mainstream benchmarks including Terminal-Bench, MCP Atlas and CharXiv Reasoning, setting new performance records for reasoning, coding and agent capabilities in the Flash lineup, serving as a cost-effective production foundation for enterprise agent clusters, R&D teams and multimedia service providers. Paragraph 2: Equipped with a native dynamic thinking engine supporting adjustable reasoning levels and an optimized unified five-modal encoder, it greatly enhances long-distance context recall and stability of long-chain tool invocation, delivering leapfrog improvements in iterative code debugging and self-verification for complex business logic. It achieves an inference throughput of 289 tokens per second (4× faster than peer frontier models) at only 75% of flagship Pro pricing; prompt caching further cuts costs by up to 90% for repetitive workloads, balancing ultra-low latency, top-tier capability and large-scale deployment economy. Key Capabilities ● Dynamic Multi-Level Chain Reasoning: Features adaptive thinking tiers with automatic multi-round self-verification for complex tasks, scoring 83.6% on the MCP Atlas agent benchmark, excelling in long-term business simulation, mathematical research and strategic decision-making with drastically reduced hallucination rates. ● Flagship Full-Stack Coding Performance: Scores 76.2% on Terminal-Bench 2.1, surpassing prior Pro generations, enabling global medium-sized repository analysis, full-stack development, vulnerability auditing, interactive web building and autonomous code debugging. ● Native Batch Five-Modal Processing: Supports mixed text-image-audio-video-PDF input, capable of parsing 10-hour audio recordings, long meeting videos and complex engineering drawings for automatic transcription, table extraction and cross-modal structured data generation. ● Million-Token Lossless Long Context: Delivers over 92% key detail recall for ultra-long documents, ideal for centralized analysis of contract clusters, enterprise knowledge bases and extended multi-turn agent workflows. ● Enterprise-Grade Reliable Agent Orchestration: Enables serial & parallel chained tool calls, sandbox code execution and real-time web grounding with built-in parameter validation, automatic retry and result filtering to stabilize large-scale multi-agent collaborative automation pipelines. ● Prompt Caching Cost Optimization: Reuses static and conversational prompts to slash token consumption by up to 90% for recurring tasks, perfectly suited for cyclical agent workflows and high-volume batch data processing. ● Global Multilingual Safety Alignment: Natively supports hundreds of languages with optimized layered risk control, data desensitization and false refusal suppression for stable deployment in highly regulated sectors including legal, finance and healthcare. Technical Strengths 表格 Feature Benefit Load-Balanced Sparse MoE Architecture Balanced expert routing eliminates performance fluctuation under batch workloads, maintaining millisecond-level stable latency for million-scale concurrent API requests while maximizing computing utilization and service reliability Adaptive Dynamic Thinking Engine Allocates lightweight inference for simple queries and high-end computing for complex reasoning, dynamically balancing latency and accuracy to avoid resource waste across diverse business loads Lightweight Unified Five-Modal Encoder Compresses redundant multimedia features to accelerate bulk preprocessing, cutting bandwidth and server overhead for industrial-scale media asset processing pipelines Dual-Layer Long Context Attention Optimization Mitigates distant information forgetting within million-token context, preventing critical clause omission during bulk legal and financial document audits to avoid enterprise compliance and economic risks Hard-Constrained Structured Output Enhancement Native JSON Schema validation drastically reduces format parsing errors in batch data extraction and agent tool interaction, significantly improving downstream system stability in production Low False Refusal Safety Fine-Tuning Minimizes unnecessary blocking of legitimate business requests in regulated industries, reducing manual intervention and accelerating enterprise commercial deployment efficiency Capability Ratings 表格 Dimension Rating Notes Reasoning Top-tier Adaptive multi-level deep thinking outperforms previous Pro models on agent, mathematical and business reasoning benchmarks, supporting most high-complexity industrial decision scenarios Coding Top-tier Represents the peak coding performance of the Flash series, excelling in full-stack development, code auditing and project refactoring for medium-to-large software engineering tasks Creative Writing Excellent Generates rigorous, well-formatted academic papers, commercial tenders and multilingual official documents with superior consistency for long-form formal content production Multimodal Top-tier Full five-modal capability with leading performance on batch long-video, scanned file and complex chart analysis among lightweight commercial models Response Speed Very Fast 289 tokens per second throughput, 4× faster than mainstream frontier models, delivering millisecond latency for consumer-facing real-time applications and enterprise high-frequency APIs Context Window Huge Million-token input capacity supports one-shot global analysis of massive document corpora and codebases, built for knowledge base governance, contract auditing and long-cycle agent workflow deployment Use Cases ● Enterprise Large-Scale Agent Cluster Deployment: Build collaborative multi-type agents for data reporting, workflow approval and customer operations to realize end-to-end business automation and cost reduction via stable tool calling and prompt caching. ● Full-Lifecycle Software R&D Acceleration: Analyze medium-scale code repositories, generate business logic code, detect security vulnerabilities and batch produce test suites & API documentation to shorten development iteration cycles. ● Industrial Multimedia Asset Digitization: Automatically transcribe, summarize and archive meeting recordings, engineering blueprints and scanned paper documents to complete offline enterprise asset digital transformation. ● Enterprise Document Compliance & Risk Governance: Batch-audit commercial contracts, financial statements and industry policies to extract hidden compliance risks, compare document versions and auto-tag files to cut manual legal & financial workloads. ● Global Cross-Border Business Expansion: Deliver precise professional translation, terminology normalization and compliance validation for international contracts, technical manuals and overseas marketing materials to support cross-border e-commerce globalization. ● STEM Academic Research Assistance: Parse bulk research papers, experimental charts and lab recordings to generate literature reviews, validate research data and build simulation code for efficient academic innovation. ● High-Concurrency Consumer AI Service: Power intelligent customer service, content creation and online education platforms with ultra-fast response and cost-efficient scaling to balance user experience and operational expenditure.