Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Creating images with AI from a simple text prompt is becoming significantly faster, thanks to new methods adopted by Stability AI, the developer of the popular Stable Diffusion model. Until now, people had to wait seconds or even minutes for AI to generate images based on their prompts. However, with the introduction of the new SDXL Turbo mode by Stability AI, real-time image generation is now accessible to everyone.
This significant improvement is due to a massive reduction in the number of generation steps. What previously required 50 steps now only takes one, resulting in a reduced computational load. Stability AI claims that SDXL Turbo can produce a 512×512 image in just 207 milliseconds on an A100 GPU, a substantial speed enhancement over earlier AI diffusion models.
The SDXL Turbo experience feels a lot like the predictive typing features now available in various search engines and browsers, but this is for image generation at the speed of thought. Typically, faster speeds come from better hardware, but that’s not the case here. The turbo acceleration for SDXL isn’t driven by advanced hardware but by a new technique Stability AI has been researching, called Adversarial Diffusion Distillation (ADD).
The SDXL base model, first announced by Stability AI in July, was initially predicted to serve as a robust foundation for other models. Stable Diffusion faces competition from several text-to-image generation models, including OpenAI’s DALL-E and Midjourney.
One of the significant innovations that enable the original SDXL base model is the concept of ControlNets, which provides better control over image composition. The SDXL base model benefits from 3.5 billion parameters, allowing it to be aware of more concepts, thereby delivering better accuracy.
SDXL Turbo builds on the achievements of the SDXL base model and enhances the speed of image generation. Stability AI is adopting a common modern generative AI development path: first, develop the most accurate model possible, then optimize it for performance. This approach is similar to what OpenAI has done with GPT-3.5 Turbo and the more recent GPT-4 Turbo.
In the process of speeding up generative AI models, there is often a trade-off with quality and accuracy. However, in the case of SDXL Turbo, the compromise on image quality is minimal, with highly detailed results that are only slightly less precise than those produced by the non-accelerated version of SDXL.
Adversarial Diffusion Distillation (ADD) is designed to merge the high sample quality of diffusion models (DMs) with the speed of Generative Adversarial Networks (GANs). ADD combines adversarial training and score distillation to harness the knowledge from a pretrained image diffusion model. It offers fast sampling while maintaining high fidelity, iterative refinement ability, and benefits from stable diffusion model pretraining.
Experiments by Stability AI researchers show that ADD significantly outperforms GANs, Latent Consistency Models, and other diffusion distillation methods in just 1-4 steps. Although the SDXL Turbo model isn’t yet deemed ready for commercial use by Stability AI, it is available for preview on the company’s Clipdrop web service.
Limited tests by VentureBeat showed that the image generation was indeed fast, though the Clipdrop beta currently lacks some advanced parameter options for different stylistic image generation. Stability AI has also released the code and model weights on Hugging Face under a non-commercial research license.