“Keyboard, how quaint,” said Scotty when he uses a keyboard to type in the formula for transparent aluminum after failing to communicate by voice to a Mac. That’s how ingrained speech recognition was in the future in getting computers to get things done. In the future, interfacing with computers wasn’t done through the GUI but by talking to them. We’ve been fascinated with talking to computers since the 1960s in Star Trek. We’ve seen plenty of other futurist movies and TV shows where people talk to computers all the time and a mouse is nowhere to be seen. It’s because no ordinary person would want to be subjected to the keyboard and the command line especially back then. Talking to computers was the next best thing. But the mouse seems to have overtaken us at some point and everyone got obsessed with grids of icons and overlapping screens. Now, icons and screens seem dated and have taken a toll at our eyes and it also happens that not everyone can actually use the GUI or graphical user interface. The real thing, the true user-friendly way everyone can interact with computers and other machines is by voice and language recognition. We have one foot at the door. Let’s see what else is behind it.
At this point, we seem to have reached an impasse with regards to the GUI to the point that we’re just switching things around and redesigning icons, menus, screens and borders. Windows went from flat to a 3D look, then to glossy and then back to flat again. iOS followed suit. With Linux and Android, there’s a plethora of styles to choose from. When it comes to Windows, we may have seen it all and when Microsoft said that 10 is the last version, they may have the same thought. But what the GUI did was to improve our relationship with computers. It eased the ordinary person and even the technophobes into the realm of computing. The GUI however was not for everyone as the blind can’t see it nor can paraplegics manipulate it.
Again, the next logical step wherein everyone, including the blind and paraplegics, can interact with computers and devices is by voice. Voice recognition technology has been around for quite some time so when Siri came out on the iPhone 4S, it seemed like a novelty as voice recognition for the consumer back then was clunky at best and with a very limited vocabulary. The best ones were very expensive and required training the system. But the Siri demos were quite convincing and most of the time, Siri could get things right as long as the sentence was simple enough. Ask Siri for the time, ask Siri for the weather, ask Siri what’s next in the schedule and ask Siri for the nearest restaurant. Siri has the answer. Ask Siri for something more complex, she’ll come up with a web search. Ask Siri for something unconventional, she’ll respond with a witty joke or sarcasm. Siri is as close as we got to conversational interaction with a computer. Thanks to Siri, we became one step closer to actually talking to our devices. As Siri took off, it started a race for the best intelligent personal assistant. Microsoft now has Cortana on its Windows Phone and Windows 10 operating systems. Cortana is as intelligent and witty as Siri. Google has Google Now, Amazon Echo has Alexa (a sexier name than Siri or Cortana) and Facebook will soon have M.
But what would make the best intelligent personal assistant? One that can properly recognize its owner’s speech patterns and understand more complex sentences. Most people have a quirk in their speech which often elicits the question, “Come again?” These quirks, like their accent, speech impediments and other factors like background noise can sometimes mess up even the simplest of queries or sentences often due to misspelling of what the computer ‘heard’. Conditions like rhotacism can be a problem. So if something like, “Siwi, pwease show me to the neawest westawant,”doesn’t translate to launching a nuclear warhead, we’re okay. Improving on speech quirks is the next step to really allow users to communicate to computers by voice. Another is improved understanding of what was said or context. And another is the ability to cascade or understand successive queries or sentences which is closer to actual conversation. Hound, by Soundhound Inc. has been relatively successful in these areas. Hound can determine whether the next statement by the user is related to the previous query. So complex questions such as which car dealer in New York offers the newest Chevy Camaro at the lowest interest rate can be answered through successive mini questions. Hound is currently available on Android as an alternative to Google Now. It wouldn’t be a far-fetched idea for either Apple or Microsoft to purchase Soundhound and apply the technology for Siri or Cortana. The first to do so would be one significant step above the other.
As we move forward to the Internet of Things (currently the turf of Alexa of Amazon’s Echo), speech recognition technology becomes more important since attaching keypads to a toaster or coffee maker would be rather awkward. With Echo, users can tell Alexa to turn the lights on or off, set the thermostat, among other things. Like the Jetsons’ Rosie except less mobile with Siri already in charge of the TV. Cortana controlling the shower’s water temperature would probably complete the home of the wealthy nerdy bachelor.
But seriously, with or without Soundhound, Apple is already working on improving Siri and Microsoft is doing the same with Cortana. Nest and Amazon’s echo proves that IoT is upon us. Google and Apple are also close to releasing their respective driverless cars; and they better understand their drivers correctly or hit the town fountain in case of drivers like Sylvester Stallone. The next decade should be interesting.