Even if the Governing Mayor of Berlin, Franziska Giffey, hadn’t become a deepfake victim of a fake Vitali Klitschko in June, the number of deceptions with the technology is increasing rapidly. We may still find a fake Tom Cruise tripping over on TikTok amusing, but when a computer-generated voice cracks a bank account’s security controls, the potential threat becomes palpable. The World Economic Forum recently stated that deepfake was one of the biggest cybercrime threats. So: Do “synthetic” media also have positive potential? ERGO CDO Mark Klein tries to give an outlook.
When James Cameron, the producer of series of blockbusters, brought his “Avatar 1” film into cinemas in 2009, the world was fascinated with what they called “digital cinema”. All the characters on the planet “Pandora” spoke and moved as you expected them to. The movements were fluid, the flights over ravines seemed real – even though everything came from a computer. A sensation, Avatar was the most successful film of all time – and the most expensive!
Today, avatars can be created at a bargain price. For my artificial alter-ego that I use in our virtual ERGO meeting room, I only had to submit one photo. My avatar was created within 10 minutes. My copy doesn’t have arms and legs, though. I can turn and move as if I am on rails, but it all looks really awkward. There’s no way that anyone could mistake my avatar for me.
With my digital twin created with software from Synthesia, however, the situation is not so clear-cut. You see me talking to you in the video in a mid shot (see screenshot). The lips move in sync, sometimes my eyebrows go up and down, my facial expression works. But it’s artificial – the video is made by computer code. I (or someone else) type(s) in a text, which my avatar speaks with a synthetic voice.
In an experiment, we presented various ERGO avatars. People who know me well and experienced my avatar were not taken in. Different facial expression, lacked vitality, furrow on forehead didn’t work, wrong eye colour. But it was different for those do know me but hadn’t been in direct contact with me for some time. They fell for the fake, even though they felt something disconcerting.
The Synthesia avatar is still relatively expensive. But the price will quickly fall, just as the quality of synthetics videos will rapidly increase. Even imitations that make us appear real from head to toe will soon be possible at an affordable price. A horror scenario?
With deepfakes, we think primarily of the damage that they can cause. It’s all about access to sensitive data with attacks via social engineering. Fraud cases are thus being reported in which thieves use fake voices to impersonate people in phone calls, including tonality and accent. A bank account at the United Arab Emirates Bank which was hacked via voice recognition is just one example that appeared in the media. There will in future also be perfidious cases of bogus kidnappings in which the supposed kidnap victim calls home.
Or just think ahead to some grandchild scams. With voices alone (no video), computers can now impersonate people so well that you can’t tell the difference. We tried that out at ERGO with an employee’s voice. Even colleagues who work with her every day were unable to tell the real voice from the fake one. The law enforcement authorities will have to prepare for a new level of fraud cases.
Another type of damage, but one that is no less malicious, is fake news on social media platforms. This involves nothing less than influencing public opinion. With written text, we now no longer know what originates from trolls and disinformation campaigns that are spread in certain echo chambers. Disseminated as video, these fakes will reach a new level.
Nowadays, all sorts of videos are circulating with supposed interviews of politicians that never took place. The fake video in which the President of Ukraine, Volodymyr Zelensky, asks his troops to lay down their weapons appeared immediately after the start of the Russian attack (incidentally, the Klitschko impersonators apparently also worked at RuTube, the Russian version of YouTube, which was part of the Gazprom Group).
What we’re now seeing is an arms race – the good guys versus the bad guys. That was how one Munich Re manager recently described it when he presented the new Tech Trend Radar – an annual re-analysis of future technologies for the insurance market. The outcome is open.
Today, the chances of identifying a fake photo still stand at a miserable 48.2%. The scalability of semantic deepfake detection techniques for data-rich systems such as social media is still in its infancy. But researchers are pinning their hopes on so-called inconsistency detectors to mitigate the risk of misuse of synthetic media. Counter-arming in the area of deepfake defence has really gathered pace!
But we can already contribute a lot to our defence and own protection now, even without technologies. In several US states, it is not allowed to distribute deepfakes of celebrities until 40 years after their death. Anyone caught doing so is punished. Not only Tom Cruise but also many other celebrities are suffering attacks of this kind. Around the time of election campaigns, deepfakes of politicians have also already been criminalised in some places.
All the rest of us who are neither celebrities nor politicians must develop a watchful eye and sensitivity to everything that could be fake. We must train ourselves and be trained. Today’s software is still quite brittle, especially when it comes to facial expression. With a bit of training, you can learn from frowning or blinking that something’s not right. But it’s also true that fakes are getting better and better.
With so much damage potential, can anything positive at all be gained from synthetic media? I believe so! Like any other technologies, they are per se neither good nor bad. It all comes down to how they are used.
For example, inexpensive synthetic media can replace expensive video productions. The video medium is becoming increasingly more important for the dissemination of information – compared with conventional productions the costs of avatar speakers are manageable. For our users, it’s possibly easier than having to read a text themselves. What’s more, text-to-video transcription is child’s play – anyone can do it. And in future, why shouldn’t living-room directors also create high-quality audiovisual content that is currently the preserve of the big film studios?
I could also have my avatar speak in different languages – at no great effort or expense. There are test videos of my avatar speaking in several languages. My avatar could therefore become a digital twin for me. For example, it would give talks for me, using content I have previously released.
An avatar as one’s own digital twin seems strange. At least for the generation that I belong to. For much younger people from Gen Z to Gen Alpha, however, having one’s own avatar is already much closer to everyday life. Anyone watching the dedication with which children create avatars in computer games and give them identities will get some idea of how normal hybrid life with real and fake personalities could one day become.
But we don’t only have to look at future generations. Who would have thought that elderly people would return full of enthusiasm from a concert experience in London at which not people but holograms with a seventies look were on stage? Concert-goers said the ABBA avatar concert looked real and was an incredible experience. To create it, the real musicians had to pre-produce the entire concert – recorded by 160 cameras!
We should get used to avatars, to synthetic “people” and media. It seems very likely that, with ever-better technology, they will find their way into every part of our daily lives.
As to my avatar assistant, I was recently shaken by an article in which the American technology analyst Rob Enderle had his say. The key question for him was: who does my avatar belong to? For example, if it was in my employer’s possession, could he send it to a keynote address without me having any say?
We not only have to deal with deepfake defence and a better sense of what is fake, but must also grapple with ethical standards. Transparency is particularly important here. For example, the ERGO voice bots that take customer calls immediately say that they are artificial. Other players are thinking about a kind of safety standards authority or seal of quality that can be used to confirm the authenticity of images or videos.
We have to get prepared and deal with the subject. But first of all, I’m looking forward to this winter, when James Cameron’s “Avatar 2” comes to our cinemas!
Text: Mark Klein, CDO ERGO Group