Meta Releases New AI Models for Audio, Text, and Watermarking

Meta’s Fundamental AI Research (FAIR) team has unveiled several new AI models and tools focusing on audio generation, text-to-vision, and watermarking. These releases aim to inspire further research and advance AI responsibly.

JASCO: Text-to-Music Generation

Meta introduced JASCO (Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation), an AI model that enhances AI-generated sound by taking different audio inputs like chords or beats. Users can adjust features such as chords, drums, and melodies through text. The JASCO inference code will be part of the AudioCraft AI audio model library under an MIT license, while the pre-trained model will be available under a non-commercial Creative Commons license.

Listen to sample of works here.

AudioSeal: AI-Generated Speech Watermarking

AudioSeal is another tool from Meta designed to add watermarks to AI-generated speech, enabling the identification of AI-generated content. It allows for localized detection of AI-generated segments within longer audio snippets, increasing detection speed by 485 times. AudioSeal will be released with a commercial license.

Get the model here.

Chameleon: Multimodal Text Model

Meta Chameleon is a family of models that can combine text and images as input and output any combination of text and images with a single unified architecture for both encoding and decoding. While most current late-fusion models use diffusion-based learning, Meta Chameleon uses tokenization for text and images. This enables a more unified approach and makes the model easier to design, maintain, and scale. The possibilities are endless—imagine generating creative captions for images or using a mix of text prompts and images to create an entirely new scene.

FAIR will release two sizes of its multimodal text model, Chameleon 7B and 34B, under a research-only license. These models are designed for tasks requiring visual and textual understanding, such as image captioning. However, the Chameleon image generation model will not be released at this time.

Request access to the model here.

Multi-Token Prediction Approach

Meta will also provide researchers access to its multi-token prediction approach, which trains language models on multiple future words simultaneously rather than one at a time. This will be available under a non-commercial and research-only license.

Get the model on Hugging Face here.

Meta Releases New AI Models for Audio, Text, and Watermarking

JASCO: Text-to-Music Generation

AudioSeal: AI-Generated Speech Watermarking

Chameleon: Multimodal Text Model

Multi-Token Prediction Approach

Q&A

Have more questions on this topic? Ask our AI assistant for in-depth insights.

Read more from sources 👇

Official Source

Related Posts

Google launches Video Overviews and Studio upgrades in NotebookLM AI assistant

Microsoft introduces Copilot Mode in Edge for smarter AI browsing

Google Shopping launches AI try-on feature and price alert updates for shoppers

Marketing Workflow Templates

Google rolls out Veo 3 video-generation model globally for AI Pro subscribers

Cloudflare blocks AI crawlers by default and offers Pay Per Crawl payment model

Google launches Doppl app to visualize outfits with AI on iOS and Android

AI Marketing Insights from Google I/O 2025 on Smarter Search Tools

Related Tools

Markifact
Verified Tool

Markifact is a Verified Tool. Want to get this badge? Contact us.

Verified Tool

Marketing Auditor
Verified Tool

Marketing Auditor is a Verified Tool. Want to get this badge? Contact us.

Verified Tool

Get Featured Here

Thunderbit

Formula Bot

Meta Releases New AI Models for Audio, Text, and Watermarking

JASCO: Text-to-Music Generation

AudioSeal: AI-Generated Speech Watermarking

Chameleon: Multimodal Text Model

Multi-Token Prediction Approach

Q&A

What is Meta Chameleon and how does it work?

How does the Multi-Token Prediction model improve LLMs?

What is AudioSeal and how does it enhance AI-generated speech detection?

What are the goals and findings of Meta's research on geographical disparities in text-to-image generation systems?

Have more questions on this topic? Ask our AI assistant for in-depth insights.

Read more from sources 👇

Official Source

Related Posts

Related Tools

Markifact Verified Tool Markifact is a Verified Tool. Want to get this badge? Contact us.

Verified Tool

Marketing Auditor Verified Tool Marketing Auditor is a Verified Tool. Want to get this badge? Contact us.

Verified Tool

Get Featured Here

Markifact
Verified Tool

Markifact is a Verified Tool. Want to get this badge? Contact us.

Marketing Auditor
Verified Tool

Marketing Auditor is a Verified Tool. Want to get this badge? Contact us.