Last week, Google CEO Sundar Pichai used the company’s annual I/O event to demo an experimental new feature of Google Assistant.
It consisted of two ordinary-sounding one-minute voice conversations, one to book a hair appointment, the other to make a restaurant reservation.
The unusual aspect of those conversations – which Google said were not staged – is that in both the caller was a computer powered by its Duplex AI technology capable of talking and responding to human beings on the other end using natural language.
The clever (or creepy) bit is that had Pichai not told audience members about the AI they would have been unlikely to have detected it.
Computer-generated voice systems are supposed to be stilted, synthesised, and limited in their responses, but this one sounded convincingly human in every way right down to its reassuringly disfluent use of “mhmm” and “um” as part of its chatter.
Duplex is robust enough that Google will start offering it to a small number of Voice Assistant Android users this summer, which they’ll use to make simple reservations like the ones in the demo.
As I/O attendees applauded, and online watchers wondered aloud whether Duplex might be good enough to pass the famous Turing test, the doubters offered a less optimistic assessment of Google’s cleverness.
Might criminals use voice AI to deceive people? What are the implications of people delegating social interaction to machines? Will it put millions of service industry workers out of a job?
Then there are nuanced ethical issues Google faces from day one, such as do people have a right to know they are talking to a machine?
This struck many as a big tech firm doing something because it could, said one New York Times writer who described the demo as “horrifying”:
Silicon Valley is ethically lost, rudderless and has not learned a thing.
Stung, Google clarified:
We are designing this feature with disclosure built-in, and we’ll make sure the system is appropriately identified. What we showed at I/O was an early technology demo, and we look forward to incorporating feedback as we develop this into a product.
But if people talking to Duplex will be told they are talking to a machine, why make it sound so convincingly human?
Technically, Duplex is a combination of systems including automatic speech recognition (ASR) and neural network ‘deep learning’ whose capabilities have surged in the last five years on the back of processing improvements and the high salaries offered to PhDs.
For now, the technology can only be used for a narrow set of tasks, but inevitably this will expand quickly, which in turn will lead to calls for more rules and regulation.
Google can probably cope with this, but what will be more difficult will be changing how people see AI, especially where it is being used to automate social interactions that express deeper meanings that are slow to evolve.
As with the ‘Turk’, a famous 19th century chess-playing automaton which turned out to be a very small man hiding under the table, it’s as if AI makes us feel like we are being deceived.
Google’s Duplex is no clever trick, but on some unconscious level, the pattern has been set – people feel compelled to look under the table or behind the curtain to find something that reminds them of themselves.