Microsoft’s AI app VASA-1 makes faces in pictures talk and sing: How does it work? (2024)

Microsoft’s AI app VASA-1 makes faces in pictures talk and sing: How does it work? (1)The AI system has been developed by a team of researchers from Microsoft Research Asia. (Image: Microsoft)

Not long ago, some apps could bring photographs to life with GIF-like motions. Now, we have an AI system that can make photographs dance and sing. A team of AI researchers at Microsoft Research Asia has created an AI application that can convert still images of people and audio tracks into animation. It’s not merely animation/ Reportedly the output accurately shows the people in images speaking or singing to the audio track, along with apt facial expressions.

The latest application, Vasa, is a framework for generating life-like talking faces of virtual characters with appealing visual affective skills (VAS) from a single static image and a speech audio clip. “Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness,” wrote researchers in a paper describing the framework.

According to the team, the core innovations include holistic facial dynamics and a head movement generation model that works in a face latent space and the development of an expressive disentangled face latent space using videos. The team said that through extensive experiments and evaluations on a set of new metrics, their method could significantly outperform previous methods along various dimensions.

Advertisem*nt

Also Read | ChatGPT gets some major upgrades: 8 new features and how to use them

“Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512×512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors,” the researchers wrote.

What is VASA-1?

The researchers at Microsoft claim that their new method is not only capable of producing lip-audiosynchronisationbut can also create a large spectrum of expressive facial nuances and natural head movements. “It can handle arbitrary-length audio and stably output seamless talking face videos.”

The researchers working on VASA-1 embarked on an ambitious task of bringing static images to life, making them talk, sing, and express emotions in perfect sync with any audio track. VASA-1 is an outcome of their efforts as the AI system transforms motionless visuals, be they photographs, drawings, or paintings, intosynchronisedanimations. When it comes to control, the researchers claimed that their diffusion model could accept optional signals as conditions like main eye gaze direction and head distance, emotion offsets.

Based on the research paper, the team has showcased the capabilities of the VASA-1 system through a host of video clips. While in one a cartoon version of the Mona Lisa springs to life and breaks into a rap song. In this example, Mona Lisa’s expressions and lip movements perfectly align with the lyrics. Meanwhile, another example shows a photograph of a woman transformed into a singing performance. Another example shows a drawn portrait of a man delivering a speech, one can notice his expressions shift naturally toemphasisethe spoken words.

Advertisem*nt

How was VASA-1 created?

According to the research paper, the breakthrough of VASA-1 happened through an extensive training process. This involved AI systems being exposed to thousands of images portraying a wide range of facial expressions. This vast data set reportedly allowed the system to learn and accurately recreate the nuances of human emotions along with speech patterns. The current iteration of VASA-1 generates high-resolution visuals at 512X512 pixels with a frame rate of 45fps making it appear smooth. Reportedly, the rendering of these realistic animations takes an average of two minutes and this is possible by using the computational power of a desktop-grade Nvidia RTX 4090 GPU.

ICYMI | What is Limitless Pendant, the world’s smallest AI wearable device?

The research paper does not explicitly mention a release date but states that VASA-1 brings them closer to a future where AI avatars can engage in natural interactions, suggesting it is a research prototype for now. Even though VASA-1’s potential use cases are wide-ranging, the researchers have acknowledged its potential for misuse. And, as a proactive measure, they have reportedly decided to withhold public access to VASA-1. They have acknowledged the need for responsible stewardship of such advanced technology to mitigate any unintended consequences or exploitation.

Although these animations seamlessly combine visuals and audio and give a lifelike charm, the researchers have said that upon closer examination, one could notice some subtle flaws and telltale signs typical of AI-generated content. Nevertheless, the examples shared showcase the technical excellence of the team that has been working on VASA-1.

Microsoft’s AI app VASA-1 makes faces in pictures talk and sing: How does it work? (2024)

FAQs

Microsoft’s AI app VASA-1 makes faces in pictures talk and sing: How does it work? ›

“The VASA framework (short for "Visual Affective Skills Animator") uses machine learning to analyze a static image along with a speech audio clip. It is then able to generate a realistic video with precise facial expressions, head movements, and lip-syncing to the audio.”

How does Microsoft's Vasa-1 brings still photos to life? ›

VASA-1 is a technological tool that brings static images to life. It achieves this by using AI to animate facial expressions and head movements on a picture. In the process, the software makes it appear as if the person in the image is speaking.

How to use Microsoft Vasa AI? ›

How to use microsoft VASA-1?
  1. Download Pixbim Lip sync AI software. Start download and Install Pixbim Lip sync AI.
  2. Select Input Photo or a Video File. Start by selecting a input photo that you want to animate. ...
  3. Select Input Audio File. ...
  4. Start the Lip Sync animation Process. ...
  5. Download the Output.
Apr 30, 2024

How does Microsoft AI work? ›

It uses algorithms to identify patterns within data, and those patterns are then used to create a data model that can make predictions. Machine-learning models are trained on subsets of data.

Is Vasa-1 available for public? ›

The researchers have recognized that VASA-1 can be misused despite its numerous applications. Also, they made the proactive decision to prevent the public from accessing VASA-1. They realized that such cutting-edge technology must be stewarded to prevent unforeseen effects or misuse.

How does Microsoft Photos app work? ›

The Photos app gathers photos from your PC, phone, cloud storage accounts, and other devices, and puts them in one place where you can more easily find what you're looking for. To get started, type photos in the search box on the taskbar, then select the Photos app from the results.

What is Vasa 1 AI? ›

We introduce VASA, a framework for generating lifelike talking faces of virtual characters with appealing visual affective skills (VAS), given a single static image and a speech audio clip.

How does Microsoft seeing AI work? ›

Seeing AI is a Microsoft research project that brings together the power of the cloud and AI to deliver an intelligent app designed to help you navigate your day. Point your phone's camera, select a channel, and hear a description of what the AI has recognized around you.

What is Microsoft's new image to video AI model Vasa 1? ›

VASA-1, created by Microsoft, is an innovative AI tool. It can transform a single photo into a short video featuring a talking face. The AI analyzes the image and a provided audio clip to generate realistic lip movements and even subtle expressions that match the speaker's tone.

What is AI explained for dummies? ›

What does artificial intelligence mean for dummies? Artificial intelligence, or AI, is like making computers smart in a human way. It helps them do things that typically need human thinking, like understanding speech, recognizing pictures, and making decisions.

Why is Microsoft putting AI in everything? ›

We've crossed an important threshold with AI. We're now using it to transform all our core IT services, to make everything we do more efficient, and secure. It's clear that AI brings immense value to our IT infrastructure.

How can I use AI? ›

Detect faces, landmarks, logos, and insights from images
  1. Detect faces in images documentation.
  2. Detect labels in images.
  3. Optical character recognition with Google Cloud AI.

What does Microsoft Vasa stand for? ›

Vasa-1, which stands for "Visual Affective Skills Animation," is an artificial intelligence (AI) tool that can turn a single picture into a short video. In this video, the face in the picture appears to talk in sync with a given audio clip.

How do you use OWA? ›

Sign in to Outlook on the web with your work or school account
  1. Go to outlook.office.com.
  2. If you're not automatically signed in with your work or school account, follow the prompts to enter the email address and password for your work or school account. ...
  3. Select Sign in. (

What makes a photo still life? ›

Traditionally, a still life is a collection of inanimate objects arranged as the subject of a composition. Nowadays, a still life can be anything from your latest Instagram latte art to a vase of tulips styled like a Dutch Golden Age painting.

How does the cloud work for storing pictures? ›

How does Cloud Storage work? Cloud Storage uses remote servers to save data, such as files, business data, videos, or images. Users upload data to servers via an internet connection, where it is saved on a virtual machine on a physical server.

How did still life photography begin? ›

Still life photography began in the early days of photography, around the 1840s. Early photographers used still life subjects to test their cameras and techniques. By the late 1800s, still life photography was being used by artists to create beautiful and atmospheric images.

How do you photograph still lives? ›

Fast shutter speed (< 1/1000 second) is good to capture fast movements of the subject. A slower shutter speed is much better for still life photography due to the 'stillness” of the subject. You can try a slower shutter speed like 1/60 second (or even slower) and use a tripod to prevent camera shake.

Top Articles
Salem witch trials | History, Summary, Location, Causes, Victims, & Facts
Salem witch trials - Hysteria, Accusations, Executions
Jazmen Jafar Linkedin
News - Rachel Stevens at RachelStevens.com
Online Reading Resources for Students & Teachers | Raz-Kids
Hk Jockey Club Result
Free Robux Without Downloading Apps
Qhc Learning
Skylar Vox Bra Size
WWE-Heldin Nikki A.S.H. verzückt Fans und Kollegen
Craigslist Farm And Garden Cincinnati Ohio
Morgan And Nay Funeral Home Obituaries
[Birthday Column] Celebrating Sarada's Birthday on 3/31! Looking Back on the Successor to the Uchiha Legacy Who Dreams of Becoming Hokage! | NARUTO OFFICIAL SITE (NARUTO & BORUTO)
Equipamentos Hospitalares Diversos (Lote 98)
Craigslist Mt Pleasant Sc
Craigslist Pinellas County Rentals
Where Is George The Pet Collector
Www Craigslist Com Bakersfield
Sulfur - Element information, properties and uses
Att.com/Myatt.
Grimes County Busted Newspaper
Seeking Arrangements Boston
Best Sports Bars In Schaumburg Il
Florence Y'alls Standings
Kleinerer: in Sinntal | markt.de
Tas Restaurant Fall River Ma
Tmka-19829
Quake Awakening Fragments
Pawn Shop Open Now
Frcp 47
Craigslist Free Manhattan
Atlanta Musicians Craigslist
Vocabulary Workshop Level B Unit 13 Choosing The Right Word
Uvalde Topic
Rhode Island High School Sports News & Headlines| Providence Journal
Riverton Wyoming Craigslist
Letter of Credit: What It Is, Examples, and How One Is Used
Mugshots Journal Star
Big Reactors Best Coolant
Streameast Io Soccer
UWPD investigating sharing of 'sensitive' photos, video of Wisconsin volleyball team
Amateur Lesbian Spanking
News & Events | Pi Recordings
UNC Charlotte Admission Requirements
Myapps Tesla Ultipro Sign In
Santa Ana Immigration Court Webex
Wvu Workday
Marion City Wide Garage Sale 2023
Ubg98.Github.io Unblocked
Latest Posts
Article information

Author: Aron Pacocha

Last Updated:

Views: 5929

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Aron Pacocha

Birthday: 1999-08-12

Address: 3808 Moen Corner, Gorczanyport, FL 67364-2074

Phone: +393457723392

Job: Retail Consultant

Hobby: Jewelry making, Cooking, Gaming, Reading, Juggling, Cabaret, Origami

Introduction: My name is Aron Pacocha, I am a happy, tasty, innocent, proud, talented, courageous, magnificent person who loves writing and wants to share my knowledge and understanding with you.