The Terrifying Truth About The Evolution Of Deep Fakes

In 2016, during a press event, Adobe announced new software called VoCo which promised to do for audio what Photoshop did for still images (via Vice). The software only needs about 20 minutes worth of sample audio in order to build a model of a person’s voice. Public figures have countless hours of audio from speeches, interviews, and on­-screen performances, not to mention radio and podcasts, for anyone to stitch together enough sample audio. In fact, many of us have put more than 20 minutes worth of our own voices online, making all of us potential targets.

Then, using that small amount of training audio, the user can generate new audio, spoken in the target voice, just by typing what you want it to say. In the demonstration, Adobe’s Zeyu Jin takes a piece of audio from Keegan Michael Key speaking about an award. Jin then reorders or replaces certain words. Finally, Jin cuts out a section of the sentence and replaces it with entirely new words.

The result is an audio file of a public figure saying words they never said. Fortunately, Adobe thought ahead about watermarking the files or making them otherwise identifiable.

[email protected] (Cassidy Ward)

Source link