Have you heard tales about innocent photos showing babies taking a bath disappearing from Facebook?
If so, it could well be because somebody brushed up against Facebook’s new machine learning that’s proactively spotting and removing child nudity, even if it’s nonsexual, as the platform continues its ongoing work to battle child abuse, including automatic removal of obscene images and fending off child predators as they try to groom children for abuse.
As Facebook’s Global Head of Safety, Antigone Davis, said in a blog post on Wednesday:
The platform’s Community Standards ban child exploitation and to avoid even the potential for abuse, we take action on nonsexual content as well, like seemingly benign photos of children in the bath.
In the last quarter alone, this proactive approach has led to Facebook removing 8.7 million pieces of content due to violation of policies against child nudity or sexual exploitation. Nearly all of that content – 99% – was removed automatically, without being reported.
After a team of trained staffers with backgrounds in law enforcement, online safety, analytics and forensic investigations reviews the content, findings are reported to the National Center for Missing and Exploited Children (NCMEC) and if exploitative content is identified, Facebook also removes the accounts it came from.
Facebook says it’s also helping the NCMEC to develop new software to “help prioritize the reports it shares with law enforcement in order to address the most serious cases first.”
Like many platforms, including Twitter and Google, Facebook has for years been battling the use of its services by those who exploit children. That’s included hashing known child abuse imagery. After its technology catches violative content – you can see the primer on how image hashing works, below – Facebook reports violations to the NCMEC.
Facebook also requires children to be at least 13 to use its services, though its history of enforcing that rule is far less than stellar. When Facebook introduced Messenger Kids for children as young as six in December 2017, the company said it was done to address the growing number of children who lie about their age to use the adult version of the messaging app, and to give parents more control over the social media their children use. Facebook’s research had found that 81% of parents say children start using social media and messaging apps between the ages of 8 and 13.
But even that was met with disapproval from child advocates: in January, the Campaign for a Commercial-Free Childhood accused Messenger Kids of violating the Children’s Online Privacy Protection Act (COPPA) by not clearly obtaining parental consent or allowing parents to request that it delete children’s personal information… among a slew of other “smartphone usage is rotting kids’ brains” reasons for saying no to the app.
This is not to cast aspersions on Facebook’s continuing work on AI that can defeat child exploitation, mind you – just to point out that it’s to Facebook’s bottom-line benefit to pull kids in as users, and its efforts to make that OK for a younger age group haven’t been met with universal approval.
At any rate, Facebook says it’s also collaborating with other safety experts, NGOs and companies to “disrupt and prevent the sexual exploitation of children across online technologies,” including its work with the Tech Coalition, the Internet Watch Foundation, and the multi-stakeholder WePROTECT Global Alliance to End Child Exploitation Online.
Facebook says that next month, it’s also going to join Microsoft and other industry partners to begin “building tools for smaller companies to prevent the grooming of children online.”
You can find out more about that here, on Facebook’s site.
Bringing these technologies to smaller players is good news. As machine learning gets ever more sophisticated at fighting child abuse, it would be a moral outrage for it to be accessible only to those corporations with deep pockets.
A primer on image hashing
This is how it works: A hash is created by feeding a photo into a hashing function. What comes out the other end is a digital fingerprint that looks like a short jumble of letters and numbers. You can’t turn the hash back into the photo, but the same photo, or identical copies of it, will always create the same hash.
So, a hash of a picture turns out no more revealing than this:
48008908c31b9c8f8ba6bf2a4a283f29c15309b1
Since 2008, the NCMEC has made available a list of hash values for known child sexual abuse images, provided by ISPs, that enables companies to check large volumes of files for matches without those companies themselves having to keep copies of offending images.
Hashing is efficient, though it only identifies exact matches. If an image is changed in any way at all, it will generate a different hash, which is why Microsoft donated its PhotoDNA technology to the effort. Facebook’s likely using its own sophisticated image recognition technology, but it’s instructive to look at how PhotoDNA identifies images that are similar rather than identical.
PhotoDNA creates a unique signature for an image by converting it to black and white, resizing it, and breaking it into a grid. In each grid cell, the technology finds a histogram of intensity gradients or edges from which it derives its so-called DNA. Images with similar DNA can then be matched.
Given that the amount of data in the DNA is small, large data sets can be scanned quickly, enabling companies including Microsoft, Google, Verizon, Twitter, Facebook and Yahoo to find needles in haystacks and sniff out illegal child abuse imagery.