Meta’s Fundamental AI Research (FAIR) team has unveiled several new AI models and tools focusing on audio generation, text-to-vision, and watermarking. These releases aim to inspire further research and advance AI responsibly.
JASCO: Text-to-Music Generation
Meta introduced JASCO (Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation), an AI model that enhances AI-generated sound by taking different audio inputs like chords or beats. Users can adjust features such as chords, drums, and melodies through text. The JASCO inference code will be part of the AudioCraft AI audio model library under an MIT license, while the pre-trained model will be available under a non-commercial Creative Commons license.
Listen to sample of works here.
AudioSeal: AI-Generated Speech Watermarking
AudioSeal is another tool from Meta designed to add watermarks to AI-generated speech, enabling the identification of AI-generated content. It allows for localized detection of AI-generated segments within longer audio snippets, increasing detection speed by 485 times. AudioSeal will be released with a commercial license.
Get the model here.
Chameleon: Multimodal Text Model
Meta Chameleon is a family of models that can combine text and images as input and output any combination of text and images with a single unified architecture for both encoding and decoding. While most current late-fusion models use diffusion-based learning, Meta Chameleon uses tokenization for text and images. This enables a more unified approach and makes the model easier to design, maintain, and scale. The possibilities are endless—imagine generating creative captions for images or using a mix of text prompts and images to create an entirely new scene.
FAIR will release two sizes of its multimodal text model, Chameleon 7B and 34B, under a research-only license. These models are designed for tasks requiring visual and textual understanding, such as image captioning. However, the Chameleon image generation model will not be released at this time.
Request access to the model here.
Multi-Token Prediction Approach
Meta will also provide researchers access to its multi-token prediction approach, which trains language models on multiple future words simultaneously rather than one at a time. This will be available under a non-commercial and research-only license.
Get the model on Hugging Face here.