Meta Introduces a Wave of Innovative AI Models for Audio, Text, and Watermarking Solutions

Meta Introduces a Wave of Innovative AI Models for Audio, Text, and Watermarking Solutions

Stay updated with our daily and weekly newsletters for the latest news and exclusive content featuring top-notch AI coverage.

Meta’s Fundamental AI Research (FAIR) team is unveiling several new AI models and tools for researchers. These tools focus on audio generation, text-to-vision, and watermarking.

By sharing their early research work publicly, Meta aims to inspire further iterations and advance AI development responsibly. Today marks a significant step forward for open science.

Meta FAIR is introducing four new AI models and additional research artifacts to the public, encouraging innovation in the AI community.

Audio Creation Model JASCO and Watermarking Tools

Meta is launching a new AI model named JASCO (Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation). JASCO can take various audio inputs like chords or beats to enhance the final AI-generated sound. According to a paper from FAIR’s researchers, JASCO allows users to adjust features of the generated sound, such as chords, drums, and melodies, all through text commands.

FAIR plans to release the JASCO inference code as part of its AudioCraft AI audio model library under an MIT license, and the pre-trained model will be available under a non-commercial Creative Commons license.

Additionally, Meta is introducing AudioSeal, a tool that adds watermarks to AI-generated speech. AudioSeal is designed to detect AI-generated segments in longer audio snippets. Meta claims that this technique enables faster and more efficient detection, increasing speed by 485 times compared to other models. Unlike JASCO, AudioSeal will be released with a commercial license.

Chameleon Model Released to the Public

FAIR will also make two sizes of its multimodal text model, Chameleon, available to the public under a research-only license. Chameleon 7B and 34B can handle tasks requiring both visual and textual understanding, such as image captioning. Although the image generation model of Chameleon will not be released at this time, the text-related models will be accessible to researchers.

Moreover, Meta will provide researchers access to its multi-token prediction approach. This technique trains language models on multiple future words simultaneously rather than one at a time, and it will be available on a non-commercial and research-only license.