Skype voice Translator is marketed by Microsoft in glowing terms as the machine learning (ML) “language translator that keeps getting smarter.”
Its reputation for accurately translating in near real time between ten languages (English, Spanish, French, German, Mandarin Chinese, Italian, Portuguese, Arabic, and Russian) is strong thanks, it might reasonably be assumed, to all that machine learning going on in the background.
Except that a Motherboard story quoting an unnamed Skype insider has claimed that the reason it’s so good is really because it uses human beings to help the system’s translations along by listening to snippets of real calls.
The problem? While Microsoft makes clear that audio captured using this system might be analysed, it doesn’t make clear that this is being done by people as well as machines.
Machines are just software and in their current state of development are (we hope) unlikely to have any personal opinions on what they listen to. Humans, meanwhile, introduce a very different set of possibilities.
Last week, Google and Apple had to suspend contractor access to voice commands captured by Siri and Google Assistant after an outcry at the privacy implications of allowing strangers to listen to recordings of personal audio.
Earlier this week, Amazon found itself “in discussion” with the EU’s Luxembourg privacy regulator over possible privacy implications from access to Alexa voice recordings.
Suddenly, big tech companies are struggling to explain what’s really going on without sounding as if they’re trying to explain away privacy concerns – which simply makes people even more suspicious.
Skype to HAL
While Skype Translator depends on ML for the bulk of its heavy lifting, the algorithms used still need a lot of adjustment to improve their accuracy. Microsoft says as much in its Translator privacy FAQ, describing how:
To help the technology learn and grow, we verify the automatic translations and feed any corrections back into the system, to build more performant services.
Of course, this fails to explain who or what is doing the correcting.
Motherboard says it was sent audio gathered by Translator featuring all sorts of personal content, including people discussing relationships and weight loss.
Other files appeared to suggest that, as with Google, Apple and Amazon, audio from Microsoft’s voice assistant Cortana is also being listened to by contractors.
Not a good few weeks for voice-driven AI then. Machines are being used to do lots of useful and clever things, but machine learning needs to be taught, and that requires teachers. That’s not Microsoft’s fault – but it does show that not everything can be solved by turning on lots of budding HALs and leaving them to it.
Many fret that humans will be replaced by machines. For voice AI, arguably, it’s human indispensability that might be the the more immediate challenge.