Fake audio and video footage is on the increase, spreading lies one share at a time. Is this the end of reality?

Richard van Hooijdonk

Voice impersonation and fake video: fooling humans as well as biometric authentication systems
Fake news is as old as genuine news – but this time it’s different
Video faking is destroying trust at every possible level
In the future, there’ll be video ‘evidence’ of whatever you want (others) to believe
How can we make sure footage is real (or not)?

We’re becoming increasingly accustomed to seeing and even sharing manipulated content. Think photoshopped supermodels or virtual and augmented reality filters in social media apps like Snapchat. This doctored content we consume on a daily basis is increasingly realistic-looking, to the extent that we often struggle to distinguish the fake from the real. And as if this isn’t problematic enough – for various, obvious reasons – new types of audio and video manipulation tools are emerging, driven by high-tech computer graphic tools and advanced artificial intelligence. These enable the creation – literally, from scratch – of hyper-realistic looking footage of (prominent) figures saying, well, just about anything. And figuring out whether what you hear and see is real is becoming a serious challenge.

Voice impersonation and fake video: fooling humans as well as biometric authentication systems

A team of researchers at the University of Alabama at Birmingham has been working on voice impersonation. With less than 5 minutes of footage of someone’s voice – whether it’s a live sample or audio taken from radio or YouTube – a synthesised likeness can be created that’s so authentic sounding that it can fool biometric authentication systems as well as humans. These can be used, for example, for turning textbooks into audio versions featuring the voices of famous people. Although these sound like relatively harmless applications of the technology, imagine combining this with face-morphing tech, which would open a Pandora’s Box of not-so-kosher possibilities, like well-known people making false statements. A good example is the Synthesizing Obama Project by the University of Washington. During this study, the audio from one of Obama’s speeches was used in a completely different video of the former US president. This was accomplished by using hours and hours of footage to train a recurrent neural network.

Have a look at the below video:

At Stanford University, researchers led by Dr Matthias Niessner have developed software called Face2Face, which can be used to manipulate video to enable a second person to put words in the speaker’s mouth – in real time. It’s basically a more sophisticated version of SnapChat’s face-swap. Using a webcam, the research team captures the second person’s facial expressions, after which the footage is morphed – in real time – onto the face of the person in the initial video. The technology was demonstrated with videos of Vladimir Putin and Donald Trump, and a live demonstration of the result was broadcast during a late night talk show.

Fake news is as old as genuine news. But this time it’s different

Pablo Boczkowski, a professor in the School of Communication at Northwestern University, tells the media company Seeker that “Fake news is as old as true news. There’s always been misinformation. What we have now is an information infrastructure that is very different, with a scale and a scope that we haven’t seen before”. Video and sound recordings have always appeared incorruptible, and have been exploited to mislead for decades. Since World War I, there has been film propaganda. In those days, however, the process was expensive and time consuming. Furthermore, it wasn’t difficult to distinguish between real actors and animations.

A screenshot of CNN Live where a man and a woman discuss fake news — Video and sound recordings have always appeared incorruptible, and have been exploited to mislead for decades.

But with current technology, fake or dead people can play in movies alongside actors who are very much alive. And they can be made to say and do virtually anything. Until very recently, this type of ‘magic’ required sophisticated and very costly computing. But that’s all changed. Some of these audio and video manipulation programs, created with machine learning algorithms, can even be downloaded for free – if you know what to look for. Then, using a simple home computer and open source code, anyone with a working knowledge of deep learning algorithms and a bit of imagination can create a very convincing video in which one person’s face is morphed onto another person’s body.

Video faking is destroying trust at every possible level

The proliferation of these easy and cheap techniques to create fake video not only has the potential to be damaging to people’s reputations and integrity, it also destroys trust at every level. Face-swapping has been used in porn videos, transposing the images of public figures such as pop stars, actors, and other celebs. This practice can even be used to inflame community or racial differences or to commit crimes like extortion. And even though this type of morphing technology is still far from perfect, given time, we will eventually be able to recreate someone’s voice or appearance in extremely convincing ways – so much so that it will be virtually impossible to tell the difference between authentic and fabricated footage. Shared on social media, footage like this could go viral in an instant and have disastrous consequences. People have been proven to share clearly fake articles or photos far and wide – thinking they’re true – without thinking twice. The ramifications when it comes to this new type of doctored video footage can potentially be far worse.

In the future, there’ll be video ‘evidence’ of whatever you want (others) to believe

Already there’s an abundance of news stories, blogs and articles supporting what we (want others to) believe – whatever our prejudices. In the future, there’ll be video ‘evidence’ for each and every thing you want corroborated as well. Whether it’s Trump confessing to the many allegations against him, or footage of a senator involved in a misdeed, or the Pope getting it on with a popstar. Footage of a politician making a dirty deal or a businessman engaging in fraudulent practices will be increasingly difficult to authenticate. What happens when every recording of a corporate misstep, every video of a cheating spouse, even children’s baby pictures, lose their credibility as evidence? And what about doctored video used for smearing political opponents – which becomes an increasingly likely possibility as this technology becomes more widely available? It won’t really matter whether or for what length of time the footage will hold up, because once it goes viral, it will leave an impression with millions of people that even debunking will not completely take away.

Donald Trump has reportedly already claimed that the Access Hollywood “grab ’em by the p***y” comment wasn’t made by him. In two years’ time, he might have a valid (albeit untrue) argument that it was fabricated.

These developments can only lead to a breakdown in trust in society and a steady increase in the general feelings of suspicion many of us already have. Are we inching closer to a world in which truth and lies coexist in such a way that we are unable to distinguish between the two? Will people eventually believe whatever they want, regardless of the actual truth?

How can we make sure the footage is real (or not)?

Because of the extensive distribution of hyper-realistic fake news content on social media and the resulting erosion of trust, scrutinising content that looks and sounds authentic will become increasingly important. And this will take more than advertising bans by social media giants or automated fact-checking tools that can identify fake news before it gets a chance to spread. Consumers of social news – which is practically all of us – will have to do their bit as well.

According to Mandy Jenkins from Storyful, a social news firm specialising in the verification of news content, people will need to check whether the audio is synced properly, take a close look at shadows and lighting and corroborate this with the time of day the video was supposedly taken, and ascertain if the sizing of all the elements in the video is correct. Other things to look out for are the location where the footage was created, if weather conditions match weather records of that day, and which other people were present at the event.

“What needs to change is the culture of interpretation,” says Boczkowski. “When a story about Pope Francis endorsing the candidacy of Donald Trump gets more than one million shares, that tells us that we have a culture of critique that’s not ready to distinguish this kind of misinformation”.

Hopefully, the explosion of fake news and audio/video manipulation technology and the potential damage it can cause will make us realise that we need to take a much closer look at content on social media before we hit the ‘share’ button.

General

Generic talks

Industry talks

AI for sectors

AI for organisations