Stay updated with our daily and weekly newsletters for the latest AI industry news and exclusive content.
Stability AI, famous for its Stable Diffusion text-to-image generator, has launched its new image-to-video model called Stable Video Diffusion (SVD). This model is now accessible through their developer platform and API, enabling third-party developers to integrate it into their apps, websites, software, and services.
The company shared in a blog post that this new addition offers programmatic access to an advanced video model designed for various industries. The goal is to provide developers with a smooth way to add sophisticated video generation features to their products.
Although this release can benefit enterprises wanting to create AI-generated videos, it has sparked some concerns. Stability AI has faced criticism for training its models on the open-source AI dataset LAION-5B, which was found to contain child sexual abuse material and was recently taken offline.
For those interested in creating generative video content within their apps, Stability’s new SVD API plug-ins are a top choice in terms of quality. According to a LinkedIn post by Stability AI, the service can generate two seconds of video, including 25 generated frames and 24 frames of FILM interpolation, in about 41 seconds. This might not be sufficient for extensive video projects, but it’s perfect for creating specific messaging GIFs, including memes.
Stability AI’s offering competes with video generation models from Runway and Pika Labs. Pika Labs recently raised $55 million from Lightspeed Venture Partners and launched a platform for generating and editing videos. However, neither Runway nor Pika Labs has made their video-generating AI models available through an API, limiting external developers’ ability to build apps based on those models.
Stability AI also plans to introduce a user-facing web experience for its video generator, with a waitlist available for early access.
Stable Video Diffusion was announced nearly a month ago in a research preview. The model allows users to create MP4 videos by prompting with still images, like JPGs and PNGs. According to the samples shared by the company, the model performs well but only produces short videos of up to two seconds—shorter than the four-second clips from research-centric models. However, multiple short clips can be combined to create longer videos.
Stability AI suggests that the model can be useful in sectors such as advertising, marketing, TV, film, and gaming. The latest version can generate videos in various resolutions and layouts, including 1024×576, 768×768, and 576×1024. It also features motion strength control and seed-based control for repeatable or random generation.
Despite some controversies, Stability AI is keen to lead the market. Stanford Internet Observatory recently reported that the LAION-5B dataset, used to train popular AI models like Stable Diffusion 1.5, included child sexual abuse material, leading to its removal by the publisher. Earlier this year, Stability AI was involved in a class-action lawsuit alleging unauthorized use of billions of copyrighted images for training Stable Diffusion.
Currently, Stability’s API offers access to all their models, from the Stable Diffusion XL text-to-image generator to the new SVD model. They also provide a membership service for local model hosting.