Meta is introducing some new AI models and datasets including Ego-Exo4D, Audiobox and Seamless Communication — and their breakthroughs in combining first-person and external views, audio generation and language translation
FinTech BizNews Service
Mumbai, December 3, 2023: Meta celebrated the 10-year anniversary of Meta’s Fundamental AI Research (FAIR) team on November 30, 2023. For the last decade, FAIR has been the source of many AI breakthroughs and a beacon for doing research in an open and responsible way. It is a decade of advancing state-of-the-art in AI through Open Research.
Meta’s latest revelation reads as follows:
We are committed to open science and sharing our work, whether it be papers, code, models, demos or responsible use guides.
Generating Voices and Sound Effects With Audiobox
Earlier this year, we introduced Voicebox, a generative AI model that can help with audio editing, sampling and styling. Now Audiobox, its successor, advances generative AI for audio even further. With Audiobox, you can use voice prompts or text descriptions to describe sounds or types of speech you’d like to generate. For example, you could create a soundtrack with a prompt like, “a running river and birds chirping.” You can even generate a voice by saying, “a young woman speaks with a high pitch and fast pace.” Audiobox makes it easy to create custom audio for all of your projects.
Unlocking Seamless Language Translation
Building on our work with SeamlessM4T, we’re now introducing Seamless Communication: a suite of AI translation models that better preserve expression across languages and translate while the speaker is still talking to improve speed.
Earlier versions of language translation services often struggle to capture tone of voice, pauses and emphasis, missing important signals that help us share emotions and intent. SeamlessExpressive is the first publicly available system that unlocks expressive cross-lingual communication. It uses a model that preserves the speaker’s emotion and style, and addresses the rate and rhythm of speech. The model currently works for English, Spanish, German, French, Italian and Chinese.
SeamlessStreaming unlocks real-time conversations with someone who speaks a different language. In contrast to conventional systems which translate when the speaker has finished their sentence, SeamlessStreaming translates while the speaker is still talking, allowing the person listening to hear a translation faster.
Meta is uniquely poised to solve AI’s biggest challenges. Our investments in software, hardware and infrastructure allow us to weave learnings from our research into products that can benefit billions of people.
FAIR is a critical piece to Meta’s success, and one of the only groups in the world with all the requirements to deliver true breakthroughs: some of the brightest minds in the industry, a culture of openness, and most importantly, the freedom to conduct exploratory research. This freedom has helped us stay agile and contribute to building the future of social connection.
Responsible AI Research
We value responsible AI research and openness because sharing thoughtful work through the scrutiny of peers pushes us towards excellence and builds trust in our advances. It also allows us to collaborate with the wider community, which brings faster progress and a more diverse set of contributors. Learn more about how we’re conducting AI research responsibly.