ElevenLabs Introduces Open-Source Tool for Enhancing Videos with Sound Effects

ElevenLabs Introduces Open-Source Tool for Enhancing Videos with Sound Effects

Subscribe to our daily and weekly newsletters for the latest updates and exclusive content on cutting-edge AI advancements.

A few weeks after the AI voice startup ElevenLabs introduced its Sound Effects text-to-sound AI service, the company is now offering an open-source tool to demonstrate its capabilities. This application allows creators to produce sound effect samples for their videos in about 15 seconds by analyzing the uploaded clip and providing several options.

Developers can access the app’s code on GitHub, and ElevenLabs has also launched a website where the public can experiment with its Sound Effects API.

When you upload a video to the Video to Sound Effects app, it captures four frames at one-second intervals on the client side. These frames and a prompt are then sent to OpenAI’s GPT-4o to generate a custom text-to-sound effects prompt. This prompt is used to create a sound effect using ElevenLabs’s Sound Effects API. The video and audio are then combined on the client side into a downloadable file, which can be up to 22 seconds long.

Ammaar Reshi, ElevenLabs’ design lead, explained that this tool serves as a proof of concept, showing what users can achieve with their SFX API. Many AI video creators are constantly seeking the perfect sound effect, and this tool aims to streamline that process by intelligently analyzing video frames and suggesting the best output. Reshi expressed excitement about the dynamic experiences that could be created using this API, such as immersive video games where sounds are generated based on player interactions.

The API enables developers to create fully custom AI sound effects using a short description. ElevenLabs charges by the number of characters used: 100 characters per generation or 25 characters per second for a set duration.

A brief test of the video-to-sound effects app showed its simplicity. After uploading a silent video of a vehicle driving through rough terrain, the AI generated four different sound options, all resembling car sounds on a gravel road. While applying sound effects to clips can be entertaining, the true potential of this tool lies in its integration into larger systems for greater benefits.

As the AI video generation field becomes more competitive, ElevenLabs aims to stay ahead by developing innovative audio solutions that they believe will be highly sought after by developers, filmmakers, and creators.