Online, we can be whoever we want—a bulked-up soldier, a freakish banana, Sonic the Hedgehog—only until we start speaking. Over any online game’s built-in chat or Discord, our voices have the power to immediately reveal information about our geographic region, our gender, even our ethnicity. Now, a company that makes “voice skins” is trying to change that, too.
“You go online and have the freedom to design your avatar, choose your username, pick what communities you jump into. You can design your online persona completely separately from who you are in the real world,” said Modulate founder Mike Pappas. “Modulate gives you the complete freedom to design your online persona from scratch instead of bringing the real world with you.”
Modulate was founded by two MIT graduates as a “What if?” exploration into the “spy movie” idea of using technology to change your voice. At face, it’s not a new or original idea; Photoshop creator Adobe unveiled their own voice conversion program, or “VoCo,” in 2016, for example. On top of generic audio applications, like podcasts or voiceovers, Modulate is now gearing their product toward gaming. Their voice skins, as they call them, transform users’ voices live by filtering through their artificial intelligence software, which can be set to generate outputs resembling a certain gender, speaking style or celebrity (there’s a Barack Obama skin).
Pappas is hoping to integrate the technology into online games and, eventually, transform users’ voices into their favourite gaming characters’, too. Instead of sounding like a stilted text-to-speech robot, the audio outcome is more believably human, echoing a user’s excitement or sadness with faithful changes in tone or pacing. “If Overwatch was interested in using it and we were built into their voice chat provider, maybe they’d be able to design some Overwatch-specific voice skins that would become available as a microtransaction to a player,” Pappas said. Although Pappas explained that Modulate is working on partnering with gaming companies to offer their voice skins in-game, he said the company is also exploring a standalone app option.
Technology that transforms media and masks its original form always runs the risk of doing more harm than good. The more viral, explosive images are revealed to be Photoshopped, the more sceptical savvy internet dwellers might be about how real their content is. Voice skins will be no different should they take off. When Kotaku asked Pappas how he’ll handle potential misuses of the technology, like catfishing, he said Modulate’s audio has a “watermark.” “It’s not something humans hear,” he said, “but there’s a detection algorithm that can be used in real time to detect the watermark.”
Unfortunately, he added, only Modulate’s detection technology can currently determine whether someone’s voice has been altered with its vocal skins. And as their AI software gets better, and the skins become more lifelike, Pappas added, “there will be fewer clues that this is synthetic except our watermark.”
Pappas says Modulate should be finding its way into games summer 2019. He and his co-founder have already tried it in Dota 2 and League of Legends, they said, and it went smoothly until his co-founder began laughing. “The voice skins had never heard laughter because they were trained on data sets of speech. It came out as a stuttery ‘Ha. Ha. Ha. Ha,’” Pappas recalled. “We’ve now added laughter to the dataset to support that kind of thing.”
Featured image: Gorodenkoff (Shutterstock)