Every child knows the fairy tale of the shepherd boy and the wolf. The boy is supposed to protect the flock of sheep from hungry wolves. Nothing happens for days. The boy is bored - and so he calls out "Wolf!" at some point. Adults rush over, but there is no wolf. It is false information, spread by the boy. The next day he calls out again: "Wolf!". Again there is no wolf to be seen far and wide. When a wolf does indeed appear and the boy calls out, he gets no help. No one believes him any more. So the wolf eats the flock of sheep - and in some variations of the story also the boy. How could this have been prevented? The supposed answer: detection software.
If adults could always automatically see with a telescope whether a wolf is really threatening the herd, they would be independent of the boy's claims. Or in deepfake language: If someone with the right tools analyses whether the Wigald Boning video is a deepfake, that should help us as a society in the fight for truth. As far as it goes, this sounds like an ideal solution. But this parable has a big problem. Vicious deepfakes are not wolves that always try to snatch sheep right away. They are rather gene-mutated wolves who perfect their hunt from week to week.
Accordingly, an all-encompassing detection solution for deepfakes is vacant. Experts currently name three different techniques for dealing with deepfakes: forensic analysis, digital signatures or digital watermarks.
These three techniques each attack their own points in time in the deepfake life cycle. Digital signatures take effect at the moment of production, for example while filming with a mobile phone app. Digital watermarks can be implemented during synthesising, the process in which artificial intelligence creates the deepfake. Forensic analysis becomes useful when a fake video already exists and needs to be checked for accuracy. So if a Wigald Boning fake video suddenly appears that has neither digital signatures nor watermarks, Forensic Analysis comes into play.
The results of our test are sobering, but reveal a major obstacle in forensic analysis. Many detection software are so-called low-level solutions. This means: if certain pixels in an image behave conspicuously, the software strikes. Both platforms we tested are low-level applications. High-level applications look at the human being as such instead of individual pixels. How does the head move? Do the person's movements match what the person is doing?
A good example of a high-level approach is provided by the very well-known Tom Cruise deepfake. Researchers around Hany Farid from UC Berkeley have put their focus on the ears. Ears have very individual shapes and since deepfakes usually do not change the whole head, but only the face, they provide a good indication of the authenticity of a video. But not only that: the way ears move while speaking is also difficult to imitate.
Both low-level and high-level solutions have their own problems. Especially with low-level approaches, the training data are important. This way, the software can only recognise as fake what it has learned as fake. High-level methods, on the other hand, are often more elaborate than their counterparts. And for both, if deepfake producers know where the problem is, it can be fixed. It's a cat-and-mouse game between detectors and creators.
So what can happen next in the field? Dominik Kovacs, chief technology officer at Defudger - a company that specialises in detecting deepfakes - believes the future lies with watermarks and digital signatures. "Hopefully it will help us create a safer internet and a safer place for content of any kind," Dominik explains. In the short term, he says, forensic capabilities are important and good, but in the long term you can only lose this cat-and-mouse game.
Hao Li, CEO of Pinscreen, a company that uses deepfake technology to create virtual avatars, also sees difficulties in relying on low-level forensic analysis. Hao is working on a project with the US Defense Advanced Research Projects Agency (DARPA), a division of the US Department of Defense, on effective methods to detect deepfakes. "This will soon stop working," Hao says of detecting pixel anomalies in deepfake videos. High-level approaches are going effective, he says. This is also a method for the future. Gesticulation, movement, all of that is difficult to fake.
Finally, there is always the question of how far detection software works not only technically, but also humanly. What if it is irrelevant for the user what is true and what is not? We talked about this in an episode of our podcast. Feel free to listen in.