Google Launches Gemini 1.5 Flash, Pro Featuring 2M Tokens for Public Use

Google Launches Gemini 1.5 Flash, Pro Featuring 2M Tokens for Public Use

Sign up for our daily and weekly newsletters to stay updated with the latest in industry-leading AI insights and exclusive content.

Google Cloud has now made two versions of its top AI model, Gemini 1.5 Flash and Pro, available to the public. Gemini 1.5 Flash is a compact multimodal model with a 1 million token context window designed for specific high-frequency tasks. It was first unveiled in May at Google I/O. In contrast, Gemini 1.5 Pro is a far more powerful model that allows for a 2 million token context window and was initially launched in February. This enhanced version is now accessible to all developers.

The launch aims to show how Google’s AI technologies help businesses create advanced AI solutions. During a press briefing, Google Cloud’s CEO, Thomas Kurian, highlighted the company’s significant progress in generative AI. He mentioned that many organizations like Accenture, Airbus, Anthropic, Box, Broadcom, Cognizant, Confluent, Databricks, Deloitte, Equifax, Estée Lauder Companies, Ford, GitLab, GM, the Golden State Warriors, Goldman Sachs, Hugging Face, IHG Hotels and Resorts, Lufthansa Group, Moody’s, Samsung, and others are using their platform. Kurian attributes this broad adoption to the robust capabilities of Google’s models and the versatility of its Vertex platform, promising continuous rapid advancements in both areas.

Google is also introducing context caching and provisioned throughput, features designed to enhance the developer experience.

Gemini 1.5 Flash

Gemini 1.5 Flash provides developers with lower latency, more affordable pricing, and a context window apt for retail chat agents, document processing, and bots capable of synthesizing entire repositories. Google mentions that Gemini 1.5 Flash is, on average, 40 percent faster than GPT-3.5 Turbo with a 10,000-character input. Its input cost is four times lower than OpenAI’s model, with context caching available for inputs exceeding 32,000 characters.

Gemini 1.5 Pro

Gemini 1.5 Pro offers developers a significantly larger context window of 2 million tokens, putting it in a league of its own since no other prominent AI model matches this limit. This model can handle large volumes of text before generating responses, making it highly efficient for various applications. For instance, it can process two hours of high-definition video as a single unit, without needing to split it into chunks. It can also manage an entire day’s worth of audio, several hours of video, more than 60,000 lines of code, and upwards of 1.5 million words, bringing immense value to many businesses.

Kurian outlines the differences between Gemini 1.5 Flash and Pro, noting different use cases. If you need to process a lengthy two-hour video in one go, Gemini 1.5 Pro is ideal due to its large context window. On the other hand, for tasks requiring low latency and more predictable processing times, Gemini 1.5 Flash is more suited.

Context Caching for Gemini Models

To help developers make the most of Gemini’s various context windows, Google is rolling out context caching in public preview. This feature allows models to store and reuse previously processed information, making them more efficient and reducing compute costs by up to 75 percent. This capability is crucial for handling long documents or extended conversations as context windows expand.

Provisioned Throughput for Gemini Models

Provisioned throughput enables developers to scale their use of Google’s Gemini models more effectively. It sets a fixed number of queries or texts a model can handle over a given time frame. Previously, developers used a pay-as-you-go model. Now, with provisioned throughput, they can enjoy better predictability and reliability for production workloads.

Kurian explains that provisioned throughput ensures they can reserve inference capacity for large-scale events, preventing service-level issues. This feature, now generally available with an allowlist, provides assurances on response time and system availability, a significant leap forward in service reliability.

Stay updated with the latest AI news delivered straight to your inbox by subscribing to our newsletter.