Sign up for our daily and weekly newsletters to get the latest updates and exclusive content on industry-leading AI developments.
Google is bringing its latest text-to-image model, Imagen 3, to the Vertex AI platform. Available soon in preview for selected customers, Imagen 3 promises faster image generation, better understanding of prompts, more realistic images of people, and improved text rendering within images compared to its previous version.
First introduced at Google I/O in May, Imagen 3 was initially available to a limited group of creators in a private preview via ImageFX. Google had announced plans to make it available on Vertex AI.
“Imagen 3 is our most advanced image generation model yet,” explained Douglas Eck, senior research director at Google DeepMind. “It delivers more photorealistic images with richer details and fewer visual distortions. It also comprehends prompts written colloquially—the more detailed and creative the prompt, the better the results. Imagen 3 excels at incorporating small details in longer prompts and is our best model yet for rendering text, which has been a challenge for previous models.”
With its rollout on Vertex AI, Imagen 3 includes multi-language support, safety features like Google DeepMind’s SynthID digital watermarking, and support for multiple aspect ratios.
Shutterstock is one of the companies using this model. “Since integrating Imagen into our AI image generator, users have created millions of images with it,” said Justin Hiza, vice president of data services at Shutterstock. “We’re thrilled about the enhancements Imagen 3 offers, enabling faster execution of ideas without compromising quality. We also appreciate the built-in safety features and Google Cloud’s indemnification for generative AI, which aligns with our commitment to ethically-sourced AI image generation.”
Despite the advancements with Imagen, Google has not provided a timeline for when its Gemini AI will resume image generation, following criticisms about its accuracy. When asked, Google Cloud CEO Thomas Kurian clarified that Imagen and Gemini are distinct models with different functions: “Gemini is a multimodal model, capable of processing various types of input and reasoning across images, video, and audio. Imagen, on the other hand, is a diffusion model designed for generating high-fidelity text-to-image outputs. They serve different purposes and are not interchangeable.”
Further inquiries about when Gemini’s image functionality will be reinstated were not answered.