Sign up for our daily and weekly newsletters to get the latest updates and exclusive content on industry-leading AI developments.
Bigger isn’t always better, especially when running generative AI models on everyday hardware. Stability AI is learning this firsthand with today’s release of Stable Diffusion 3 Medium. Stable Diffusion is the company’s main model for generating images from text. The first version of Stable Diffusion 3 was previewed on February 22, and it became publicly available through an API on April 17.
The new Stable Diffusion Medium is a smaller but still powerful model designed to operate on consumer-grade GPUs. This medium-sized model makes Stable Diffusion 3 a more appealing option for users and organizations with limited resources but still in need of advanced image generation technology.
Starting today, you can try Stable Diffusion Medium via API or the Stable Artisan service on Discord. Additionally, the model weights will be available for non-commercial use on Hugging Face.
With this release, the original version of Stable Diffusion is now named Stable Diffusion 3 (SD3) Large. According to Christian Laforte, co-CEO of Stability AI, SD3 Large has 8 billion parameters, whereas SD3 Medium has only 2 billion parameters. Laforte highlighted that the smaller SD3 Medium is designed to run efficiently on consumer hardware.
To run Stable Diffusion Medium, you only need 5GB of GPU VRAM. This means it can work on various consumer PCs and high-end laptops. However, Stability AI recommends 16GB of GPU VRAM for optimal performance, which might be challenging for most laptops but still not unreasonable.
Despite its smaller size, Stability AI assures that SD3 Medium offers high-quality performance comparable to SD3 Large in many areas. Laforte mentioned that SD3 Medium excels in photorealism, prompt adherence, typography, resource efficiency, and fine-tuning, making it comparable to the current SD3 Large API. He added that users can expect highly realistic images from SD3 Medium, which provides more detail per megapixel compared to previous models, thanks to its 16-channel VAE (Variational Autoencoder).
When it comes to understanding prompts, SD3 Medium shows impressive capability in interpreting natural language instructions, including spatial understanding of image elements. The smaller model is also excellent at fine-tuning, efficiently capturing details from specialized datasets.
One of the major enhancements in the overall SD3 lineup is improved typography, a feature that SD3 Medium also incorporates. The standout feature, however, is its resource efficiency. Laforte emphasized that the smaller size and modularity of the 2-billion parameter model allow for reduced computational requirements without sacrificing performance. This makes SD3 Medium an ideal choice for environments where managing resources and efficiency are crucial.