Watch and Cry (or Smile): Synthesia's AI Video Avatars Now Show Emotions

Generative AI has captured the public imagination by creating complex, plausibly real text and images from verbal prompts. But the catch – and there is often one – is that upon closer inspection the results are often anything but perfect.

People point strange fingers, floor tiles slip and math problems are just that: the problem is that sometimes they don’t add up.

Now Synthesia – one of the ambitious AI startups in the video space, particularly custom avatars for business users to create promotional, training and other corporate video content – is releasing an update that it hopes will help address some of the challenges in its Companies to master specific field. The latest version features avatars based on real people recorded in their studio, providing more emotion, better lip tracking and supposedly more expressive natural and human movements when fed text to create videos.

The release follows some impressive progress the company has made to date. Unlike other generative AI providers like OpenAI, which has developed a two-pronged strategy – raising large public awareness with consumer tools like ChatGPT while building a B2B offering whose APIs are used by both independent developers and giant corporations – Synthesia is leaning towards it in the approach that some other prominent AI startups are pursuing.

Similar to how Perplexity focuses on really nailing AI generative search, Synthesia is focused on creating the most human-like generative video avatars possible. More specifically, it strives to do this only for the business market and use cases such as training and marketing.

This focus has helped Synthesia stand out in what has now become a very crowded AI market that is at risk of becoming commodified as hype focuses on longer-term concerns such as ARR, unit economics and operational costs associated with AI implementations.

Synthesia describes its new Expressive Avatars, released today, as the first of its kind: “The world’s first avatars generated entirely with AI.” Building on large, pre-trained models, Synthesia says its breakthrough was how they are combined to achieve multimodal distributions that more closely mimic the way actual people speak.

These are generated on the fly, says Synthesia, which is intended to be closer to the experience we have when we speak or react in life, and contrasts with how many AI video tools that rely on avatars work today: typically these It is actually a lot of video pieces quickly put together to produce facial reactions that more or less match the scripts that are fed to them. The aim is to appear less robotic and more lifelike.

Previous version:

New version:

As you can see from the two examples here, one from the older version of Synthesia and the one released today, there is still a lot of work to be done in development, as CEO Victor Riparbelli himself admits.

“Of course it’s not 100% there yet, but it will be there very, very soon, by the end of the year. It’s going to be so mind-blowing,” he told TechCrunch. “I think you can also see that the AI part here is very subtle. In humans, so much information is contained in the smallest details, the smallest movements of our facial muscles. I think we could never sit down and describe: “Yes, that’s how you smile when you’re happy, but that’s wrong, right?” It’s so complex for humans to ever describe it, but it can be [captured in] Deep learning networks. They’re actually able to figure out the pattern and then reproduce it in a predictable way.” The next thing to work on is the hands, he added.

“Hands are super hard,” he added.

The focus on B2B also helps Synthesia to align its messaging and products more closely with “safe” AI use. This is particularly crucial given the high level of concern about deepfakes and the use of AI for malicious purposes such as misinformation and fraud. Still, Synthesia hasn’t managed to avoid controversy on this front entirely. As we have previously mentioned, Synthesia’s technology has been misused in the past to produce propaganda in Venezuela and spread hoaxes spread by pro-China social media accounts.

The company announced today that it has taken further steps to stop this use. Last month, the company updated its policies, it said, “to limit the type of content people can create by investing in early detection of malicious actors, increasing teams working on AI security, and technology.” “Content access authorization such as C2PA can be experimented with.”

Despite these challenges, the company continued to grow.

Synthesia was last valued at $1 billion when the company raised $90 million. What’s notable is that this fundraiser happened almost a year ago, in June 2023.

Riparbelli (pictured above right, with fellow co-founders Steffen Tjerrild, Professor Lourdes Agapito and Professor Matthias Niessner) said in an interview earlier this month that there are currently no plans to raise more money, although that doesn’t really answer the question of whether Synthesia is addressed proactively. (Note: We are very excited that the real human Riparbelli will be speaking at one of our events in London in May, where I will definitely be asking about it again. Please stop by if you are in town.)

What we know for sure is that AI costs a lot of money to develop and operate, and Synthesia has developed and operated a lot.

Prior to today’s release, approximately 200,000 people have created more than 18 million video presentations in approximately 130 languages using Synthesia’s 225 legacy avatars, the company said. (It doesn’t say exactly how many users are on the paid tiers, but there are plenty of big-name customers, including Zoom, the BBC, DuPont and more, and companies are paying.) That, of course, is the startup’s hope with today’s release With the new version, these numbers will increase even further.

Watch and Cry (or Smile): Synthesia’s AI Video Avatars Now Show Emotions | TechCrunch

Leave a Comment Cancel reply