Remember the Mannequin Challenge? It was a short-lived 2016 phenomenon where groups of people would stand still in elaborate poses while someone moved around and filmed them. It looked a lot like the ‘bullet time’ visual effect, famously used in the Matrix movies, where people would seem to stop in mid-air as bullets whizzed around them. Some of the submissions were elaborate and took a lot of effort. Just look at James Corden’s:
Like lots of other online content, the thousands of Mannequin Challenge videos that made their way onto YouTube have been repurposed.
A team of Google researchers have collected these videos for training an AI system that will help computers see 3D scenes as people do.
In their paper, the scientists explain that our understanding of object persistence lets us keep track of how far objects are away from each other in 3D space, even when they move around and go behind each other, even when we have one eye shut (which turns off visual depth perception). That’s harder for computers to do.
Computers use AI to learn this kind of thing, but they need lots of data to learn from. In this case, what they needed were videos of static objects with a camera that moves around them.
Thanks to the crazy place that is the internet, they surfaced thousands of Mannequin Challenge videos to help. The videos were just what the researchers needed to teach computers about the depth and ordering of objects. They said:
We found around 2,000 candidate videos for which this processing is possible. These videos comprise our new MannequinChallenge (MC) Dataset, which spans a wide range of scenes with people of different ages, naturally posing in different group configurations.
Because the people in the videos are static, the researchers can match their key features across multiple frames and use them to compare depth. The data wasn’t all clean, and they had to do some cleanup for things like camera blur. They also had to remove parts of the video with synthetic background (like posters, say) or people that just had to scratch an ear as the camera moved past.
The results were positive, although there were some limitations. The technique is good at recognizing depth and ordering between humans, but not so good at non-human subjects, like cars.
Like all technology research, an AI that lets computers judge the distance between people using a single lens could have many applications. You could envisage its use in smartphone cameras, making them better at shooting people, or in monocular hunter-killer drones, making them better at, um, shooting people.
That raises a question: should people have a say in whether their image or other personal data is used in AI training? The participants in those YouTube videos couldn’t have known what an obscure Google research team would use them for, and now have no say in where that research goes or how it’s used. Surely this is something that GDPR is there to protect, with its demand that companies explain exactly what personal data will be used for?
This isn’t the first time peoples’ data has been co-opted for AI datasets. IBM compiled a dataset of one million faces, harvested from the Flickr photo sharing site, to improve the diversity of its facial recognition system. In March, NBC discovered that those people had not given permission.
Vogon
“That raises a question: should people have a say in whether their image or other personal data is used”
Clearly not, people are for processing not pondering. If you wish to think please fill out the proper form.
epic_null
It’s not like Google has a specific piece of contact information for every individual who posts a video it would like to use for its research or anything. That information must be impossible to get. /s
Niall
And…? In an academic setting, this sort of research would not be allowed, as informed consent is vital.
Roy
Thank you for explaining how, as a monocular, I perceive depth.
However, I think there may be some other additions. I have a condition known as nystagmus where my single stationary eye jumps around the harder I try to focus on something, and these little motions seem to see around edges.
Long ago (20+ years), I attended a lecture from a wise visitung computer professor who compared terrain hugging missiles with stationary robots on a production line. They use the same technology in reverse. I don’t think either of those were using two cameras. They were both using perspective in successive images to keep track.