A “preferred Facebook Marketing Partner” has secretly tracked millions of Instagram users’ locations and stories, Business Insider reported on Wednesday.
Facebook has confirmed that San Francisco-based marketing firm HYP3R scraped huge quantities of data from Instagram in order to build detailed user profiles. Profiles that included users’ physical whereabouts, their bios, their interests, and the photos that were supposed to vanish after 24 hours.
It was all done in “clear violation of Instagram’s rules,” BI reports, and Facebook has subsequently kicked HYP3R to the curb. BI reports that Instagram issued HYP3R a cease and desist letter on Wednesday after the publication presented its findings, booted it off the platform, and tweaked its platform to protect user data.
Here’s the statement that Facebook is sending to media outlets:
HYP3R’s actions were not sanctioned and violate our policies. As a result, we’ve removed them from our platform. We’ve also made a product change that should help prevent other companies from scraping public location pages in this way.
Instagram’s failure to protect location data is a “mystery”
We don’t know exactly how much data HYP3R got at. But as BI notes, the company has publicly bragged about having “a unique dataset of hundreds of millions of the highest value consumers in the world that gives an edge to the leaders in travel and retail.”
According to the publication’s sources, HYP3R sucks in more than 1 million Instagram posts per month, and more than 90% of the data it brags about comes from the platform.
Data scraping is a pervasive problem online, as BI points out. We’ve seen multiple lawsuits, naming big players, brought over the practice. In 2017, for example, a lawsuit was brought against Uber over one of its units – Marketplace Analytics – that allegedly spied on competitors worldwide for years, scraping millions of their records using automated collection systems.
Researchers have done it multiple times to Venmo, to point out how much financial activity that users publicly share. A 19-year-old from Nova Scotia got arrested for scraping freedom-of-information releases from a public website.
And Instagram? It’s a data-scraper’s darling.
There was data from 49 million accounts found lying around a few months ago – May 2019. In September 2017, we saw Redditors trying to archive every single Instagram image, be it posted publicly or stored in supposedly locked accounts.
Why? Because they could. Which brings us to HYP3R and how 3asy it was for it to st3al all that data from Fac3book’s Instagram.
BI’s sources include HYP3R insiders who question how much due diligence Instagram and Facebook do on the partners who use their platforms, as well as how well the parent company and its somewhat independent company do at safeguarding user data.
BI quoted one such source, a former HYP3R employee:
For [Instagram] to leave these endpoints open and let people get to this in a back channel sort of way, I thought was kind of hypocritical. Why they haven’t [protected user location data, for example] remains a mystery.
Granted, the company only hoovered up public data. But how many users expect their public data to be stitched together with their location data and tied up in a database to be sold off to a marketing company’s clients? These are the unauthorized ways that HYP3R got that data:
- An Instagram security lapse allowed it to zero in on specific user locations, like hotels and gyms, and vacuum up all the public posts made from the locations.
- It systematically saved users’ public Instagram stories made at those locations. That content, which includes photos shared in the stories, is supposed to disappear after 24 hours. BI calls this a clear violation of Instagram’s terms of service.
- It scraped public user profiles to collect information such as user bios and followers, which it then combined with the other location information and data from other sources.
Two tools to find them all, and in the darkness bind them
To get all that, HYP3R created two tools. One was created in the aftermath of Cambridge Analytica, when Instagram began to turn off some of its application programming interface’s (API’s) functionality, including letting developers search for public posts for a given location. HYP3R put a hearty face on the deprecation, at least publicly – behind the scenes, it worked to create a way to get at the location data it had been relying on, in spite of Instagram’s having turned off the location data spigot.
The result: a tool that could geofence specific locations and then harvest all public posts tagged with that location on Instagram. Which, in turn, allowed the company to build a database that, in HYP3R’s words, is stuffed with thousands of locations, including …
hotels, casinos, cruise ships, airports, fitness clubs, stadiums and shopping destinations across the globe …
… as well as hospitals, bars, and restaurants, BI reports.
The second tool is one that collects ordinary users’ Instagram stories – as in, the posts that are supposed to disappear after 24 hours. They’ve never been available through Instagram’s API, but hey, details, details – HYP3R built a tool to collect them, to save the images for all time, and to scoop up their metadata.
For what?
The purpose of collecting all this data is, of course, to target-market users. And as we’ve seen in other cases of tracking via location data, the targeting can be unnerving and invasive. It brings to mind the New York Times article from December 2018, in which the newspaper found that supposedly “trusted” apps such as GasBuddy and The Weather Channel were among at least 75 companies getting purportedly “anonymous” but pinpoint-precise location data from about 200 million smartphones across the US.
They were sharing or selling it to advertisers, retailers or even hedge funds seeking valuable insights into consumer behavior. One example: Tell All Digital, a Long Island advertising firm, buys location data, then uses it to run ad campaigns for personal injury lawyers that it markets to people who wind up in emergency rooms.
Similarly, BI asks us to imagine that an Instagram user goes on vacation, then visits a selection of locations and businesses. The Instagram story that the user posts references all those locations. Sure, it was intended to vanish after 24 hours, but instead, in the hands of a data harvester like HYP3R, it gets made into this kind of Big Data nightmare of a voyeuristic, overly intimate story – one that it keeps forever:
Imagine visiting a new city and sharing a geotagged story with friends of the hotel you visited. By itself, it doesn’t tell viewers much about you.
But combine it with the story you posted from the hospital you visited for a checkup, and the selfie you made the next day at a sports stadium, and the story from the vegetarian restaurant you ate at, and so on, and an intimate picture of your life and interests begins to emerge over weeks and months.
Make it stop
HYP3R disputes the notion that it violated Instagram’s terms of service and data policies, citing the fact that it’s only been collecting publicly shared data. Instagram said that HYP3R has, in fact, violated its rules on automated data collection.
These are the changes that Instagram is making due to the unauthorized data abuse:
- It’s working on preventing logged-out users from getting at public location pages – something that’s been possible because of a publicly available JSON package that bundled up data into an easy-to-access format and which was available by simply appending a short string of characters to any Instagram URL.
- Instagram revoked HYP3R’s access to its APIs and removed it from the list of Facebook Marketing Partners. Until Wednesday, you could find HYP3R on that directory, which is a curated list of companies that Facebook recommends for various tasks and services – such as planning, execution and measurement – for advertisers.
Anthony Maw
If you use any online services or post anything online you can expect your user meta-data in addition to your explicitly posted data to be available to anyone throughout the universe in perpetuity…. On the Internet you can pretty much assume everything is public, or eventually will be publicly available, even if you think it’s private because of the risk of hacker data breaches.
Jon DeGeorge
They should just put a CAPTCHA in instead of blocking logged-out access to location pages…
Facebook does this for its Directory feature.