Artificial intelligence has long been the domain of science fiction fans and film buffs, but the ever-improving technologies of our age seem to be bringing us increasingly closer to that point where machines solve problems even better than humans and even take over the world. According to Stanford, AI is “the science and engineering of making intelligent machines, especially intelligent computer programs. The ultimate effort is to make computer programs that can solve problems and achieve goals in the world as well as humans”. The major goal here is to make a machine or code that can learn. According to Elon Musk, AI is humanity’s greatest existential threat. And if that’s not bad enough, Stephen Hawking says we “couldn’t compete and would be superseded by A.I”. But for now, at least, AI remains mostly bound to the realm of helpful smart home gadgets and personal assistants.
Why voice is so important
So given the terrifying and amazing potential in AI, what’s new in 2016? Interestingly, the voice seems to be one of the new and important things in this domain. Ok, so AI has had a voice for a long time (whether it is Siri or Alexa or HAL from Stanley Kubrik’s epic 2001: A Space Odyssey), but voice has made some interest leaps and bounds lately. Google’s DeepMind just produced some of the most realistic machine speech to date. It is based off real human speech. DeepMind is Google’s AI branch, initially, a small British company that was later acquired by the tech giant, and interestingly, was also responsible for creating Google’s ethics board. Read up on some interesting facts on DeepMind here.
The recipe for voice
So how do these incredibly intelligent AI’s put together realistic sounding human voices? Well, there are actually a few alternative ways of doing this. Google’s system is known as WaveNet and works on a base of real human voices; it generates voices by sampling real human speech and directly modeling audio waveforms based on it. Other well-known technologies like Siri and Alexa literally piece together a patchwork of spoken words to create each sentence. Siri’s voice is actually recordings from a voice actress called Susan. But there is an alternative to using real human speech, and it involves computer generating a voice based on sounds and grammar rules. But though it might sound promising, for now at least, computer-generated voices still sound pretty robotic and are not nearly as pleasant to listen to as Siri/Susan.
Finding the right voice
It’s pretty complicated to find a voice actor or actress and record every possible sound they can make and then somehow patch them all together to make a relatively natural sounding AI or personal assistant voice. And as if this wasn’t enough to worry about, finding the right kind of voice presents yet another challenge. Voices are powerful and personal instruments that can have a huge influence on our perception of the world. Ultimately, any choice of voice has implications for design, branding or interacting with machines. A voice can change or harden how we see each other. There seems to be a prevalence of Caucasian female voices when it comes to smart homes and personal assistants, but ultimately there is no one voice setting for all and most savvy companies are realizing the important of local voices and nuance (Siri, for example, has “Karen” from Australia, “Moira” from Ireland and more). As AI continues to grow and learn, voice grows with it, and new voice technologies are continuing to leapfrog us into the future.
Heyoya is a game-changing comment and reviews platform that brings voice to e-publishers and e-stores, improving sales and user engagement by allowing readers to express themselves beyond the medium of text. Heyoya’s new Receiver plan allows for quick and easy below-the-fold monetization. Heyoya is a game changer for websites and is proven to increase brand affinity and the quality of user-generated content.
For more information about Heyoya, click here.