Gemini

Google/gemini-3-flash-preview

100K contextFrom $0.0000 / 1M tokens

A cost-efficient ultra-fast multimodal large model developed by Google DeepMind, equipped with dynamically adjustable deep reasoning, native four-modal parsing, million-token long context and stable batch tool calling, optimized for low-latency high-concurrency scenarios including enterprise agents, high-frequency API services, coding assistance, bulk audio-video processing and consumer chat applications。

Chat

More from Google

README

Google/gemini-3-flash-preview

Supported Functionality 表格 Item Specification Input Text, Image, Audio, Video, PDF Output Text Context 1,048,576 tokens Max Output 65,536 tokens Vision ✓ Supported (Image, Long Video, Scanned PDF Analysis) Function Calling ✓ Supported (Batch Calling, Code Execution, Search Grounding, Structured JSON Output) Description Paragraph 1: Released by Google DeepMind on December 17, 2025, Gemini 3 Flash Preview is the mainstream lightweight commercial preview model in the Gemini 3 series, positioned as a high-throughput low-cost general foundation model built on optimized sparse MoE architecture with knowledge cutoff in January 2025. It outperforms Gemini 2.5 Pro on mainstream benchmarks including GPQA, MMMU and SWE-Bench, becoming the preferred infrastructure for consumer applications and enterprise large-scale agent deployments thanks to ultra-low API pricing and millisecond-level latency. Paragraph 2: Upgraded from previous Flash generations with a newly designed dynamically adjustable thinking architecture featuring four reasoning levels, it optimizes unified multimodal encoding efficiency to cut token consumption by 30% for typical tasks, alongside improved long-context recall, fault tolerance for batch tool calling and stable constrained structured output. It delivers 3× faster inference speed with only 1/4 of the cost of flagship Pro models, balancing top-tier reasoning capability, ultra-low latency and outstanding commercial cost-effectiveness. Key Capabilities ● Dynamic Multi-Level Reasoning: Offers four configurable reasoning modes ranging from minimal to high intensity, achieving 90.4% accuracy on GPQA benchmark, enabling instant responses for simple tasks and self-verified multi-step deduction for complex mathematical and business scenarios. ● Efficient Lightweight Coding Assistance: Excels at script development, API programming, code debugging, vulnerability scanning and unit test writing, optimized for small-to-medium repository analysis and high-frequency developer assistance workflows. ● Native Batch Four-Modal Processing: Supports mixed text-image-audio-video-PDF input in a single request, capable of parsing hundreds of images and hour-long videos in batches with automatic transcription and structured data extraction. ● Million-Token Lossless Long Context: Delivers over 90% key detail recall for ultra-long documents, ideal for centralized analysis of knowledge bases, contract clusters and extended multi-turn conversations. ● Reliable Batch Agent Orchestration: Enables serial & parallel chained tool invocation, sandbox code execution and real-time web grounding with parameter validation and automatic retry to stabilize large-scale automated business workflows. ● Token-Efficient Inference: Consumes 30% fewer tokens than prior generations for identical tasks; paired with API context caching to further reduce operational costs for high-volume batch API workloads. ● Multilingual Enterprise Safety Alignment: Natively supports over 100 languages with layered content risk control, data desensitization and hallucination mitigation; provides free trial tier to lower adoption barriers for SMEs and regulated enterprises. Technical Strengths 表格 Feature Benefit Dynamic MoE Expert Activation Intelligently activates a small subset of expert modules based on task complexity, cutting redundant computation by 60% to enable millisecond latency and stable million-scale concurrent requests 4-Tier Adjustable Dynamic Thinking Engine Dynamically balances response speed and reasoning depth for lightweight instant interaction or complex logical deduction, eliminating unnecessary compute waste across diverse business scenarios Lightweight Unified Multimodal Encoder Optimized multimedia feature compression drastically reduces preprocessing overhead, accelerating bulk media analysis and lowering server resource consumption for multimedia applications Dual-Layer Long Context Memory Mechanism Suppresses distant information forgetting within million-token context, preventing omission of critical clauses during bulk legal and financial document auditing to mitigate compliance risks Strict Structured Output Enforcement Native JSON Schema validation minimizes parsing failures in downstream systems, significantly improving production stability for enterprise automated agent pipelines Context Caching + Cost-Effective Pricing Conversation context reuse plus competitive token pricing drastically reduces total AI operational expenditure for large-scale commercial deployments Capability Ratings 表格 Dimension Rating Notes Reasoning Excellent Equipped with tunable deep thinking, outperforms previous flagship models on scientific and business reasoning benchmarks with flexible latency-accuracy tradeoff Coding Strong Delivers reliable script and small full-stack development with outstanding cost performance for daily high-frequency developer auxiliary scenarios Creative Writing Strong Generates well-formatted commercial, educational and multilingual content at high throughput, ideal for bulk content production pipelines Multimodal Excellent Top-tier lightweight model for batch image, long-video and scanned document parsing with stable cross-modal structured data extraction Response Speed Very Fast 3× faster inference than conventional large models, enabling real-time interactive experience for consumer-facing apps and live customer service Context Window Huge Million-token input capacity supports one-shot global analysis of massive document corpora for knowledge base and contract governance use cases Use Cases ● Consumer-Facing Real-Time Chat Applications: Powers high-concurrency intelligent customer service and conversational AI for mobile apps and mini-programs with ultra-low latency and affordable scaling cost. ● Enterprise Agent Cluster Deployment: Automates data analytics, report generation and internal workflow orchestration via tool calling to streamline end-to-end business operations. ● High-Frequency Developer Coding Assistant: Accelerates software iteration through code completion, vulnerability auditing and batch technical documentation generation for engineering teams. ● Bulk Multimedia Asset Processing: Automatically transcribes, summarizes and archives meeting recordings, scanned paper documents and product image datasets for enterprise digital transformation. ● Mass Document Governance & Risk Screening: Rapidly extracts key information and identifies preliminary compliance risks across bulk contracts, policy files and industry reports to cut manual review costs. ● Online Educational AI Tutoring: Delivers automated exercise generation, problem solving and homework grading with affordable large-scale deployment for K12 and higher education platforms. ● Cross-Border Multilingual Content Localization: Enables batch translation and proofreading for foreign trade materials, product manuals and short-video subtitles to support global business expansion.