Poor Instagram users. If it’s not one thing, it’s another.
Recently, it was a leaky API that led to 6m high-profile accounts getting hacked (and their details subsequently put up for sale at $10 a pop) – including the likes of Emma Watson, Taylor Swift, Selena Gomez and Harry Styles.
Before that, Instagram supplied us with yet another example of why you should be careful with adding friends on the platform (or any social media platform, for that matter)… And why you should be careful of those who you consider your “friends”…
… Namely, the creeps posing as friends who can be found on the creepshot-sharing site Anon-IB, where users have posted images they say they took from Instagram feeds of “a friend”.
And now, we have a new breed of data mosquito sucking off Instagram’s neck: redditors who are out to archive – in other words, to steal – every single Instagram image, be it posted publicly or stored in supposedly locked accounts.
Why? Well, in a nutshell, because they can:
You can see the appeal to those who lack qualms about taking people’s content but who love to hoard data. Consider these Instagram statistics:
- As of January 23 2017, there were 95m images being uploaded per day.
- More than 40bn photos had been uploaded to Instagram as of that date.
- The people uploading those photos are the preferred prey of image stealers: they’re young and quite often female. 31% of American women and 24% of men use Instagram.
- 59% of internet users between the ages of 18 and 29 use Instagram, as do 33% of internet users between the ages of 30 and 49.
The person who kicked off the project to rip every Instagram photo is -Archivist – one of the moderators of the r/DataHoarder subreddit. He told Motherboard that his real name is John, that he’s in his late 20s, and that when he’s not archiving Instagram, he’s “archiving something else”.
As in, for example, porn videos. Turns out he was one of the redditors who came up with a plan to test the ceiling of Amazon’s cloud storage plan, which was killed off in June. (The redditor beaston02 hit nearly 2 petabytes of porn, or about 293 viewing years’ worth of smut, by the time Amazon pulled the plug.)
John first posted his idea to create a distributed Instagram archive on January 5. At that point, by himself, he had already ripped the posts from some 3,400 accounts, or about 2.2m files, which represented about 633 GB of information.
By now, after other redditors joined in, the archive has swelled to around 580TB of Instagram posts.
He did it with an open source program called RipMe that downloads albums in bulk. It pulls in images and videos from public Instagram accounts. It was a sluggish way to do it, though, John told Motherboard:
You can go to anybody’s profile and list their followers, but this list is loaded around 20 accounts at a time. So manual collection of usernames required me to scroll for hours. I initially overcame this by literally stuffing a bit of cardboard into my ‘page down’ key and walking away from my laptop.
We’ve seen others, including Danish researchers who amassed personal data on 70,000 OKCupid users, use scrapers – automated tools – to download user data from websites. We’ve also seen sketchy third-party apps going after Snapchat user data via its public API, and we’ve seen Tinder’s API used by researchers to grab 40,000 profile pictures.
But here’s the thing with relying on APIs to pull in people’s data without their permission: that spigot can be turned off, leaving you high and dry.
But not the Instagram archival project. As John emphasized in an update to his initial post, the project doesn’t rely on Instagram’s API. Instead, it relies on John and his initial dataset, plus the current 30 to 40 people now involved (along with their valuable storage space), plus – and here’s the cherry on top – the addition of a few dozen lines of code that enable collection of photos from around 2m accounts every 24 hours.
The “vast majority” of images are from public accounts, Motherboard reports. But there are photos from private accounts, as well: John chiseled them out of their accounts by creating an Instagram bot programmed to seek out and follow private accounts in the hope that they’d follow the bot back, after which the private contents could be slurped up and added to the archive.
John said the bot has had a 70% success rate at getting followed.
Which leads us back to the injunction cited above: to protect your Instagram account from getting ransacked, be careful about who you friend. It’s all too easy to friend a bot that wants to raid your contents and suck up to your friends so it can expand its reach.
There’s more you can do, too: after the Instagram API sprung a leak and hackers stole all those high-profile user derails, we passed along five additional ways to keep your Instagram profile safe.
silvether (@silvetherus)
We are digital librarians. Among us are represented the various reasons to keep data — legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they’re sure it’s done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Timetm). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.
We are one. We are legion. And we’re trying really hard not to forget.
Paul Ducklin
Sooooo. By way of sticking up, inter alia, for people who have a “distaste for transmitting their data externally” you approve of the idea of wedging down your keyboard to scrape their data even if they didn’t actually ask you to “help” them to archive it?
What next? Perhaps you might want to appoint yourself as a “digital helpdesk” so you can phone them up and offer them help they didn’t ask for to remove malware they don’t yet know they haven’t got?
-
Just as this writer apparently appoints herself the internet’s dictionary, and declares “archiving = stealing” as some foregone conclusion for readers to blindly accept? What a crock.
Paul Ducklin
The thing that astonishes me – that almost beggars belief – is how you self-styled “archivists” have all this time, all this willingness to code, all this bandwidth, and all this storage (OK, you don’t have the storage – technically, someone else does), and of all the things you can think to do with it…
…this is what you come up with?
I don’t want to sound as though I am telling you what to do, but, hey, you guys really need to get out more.
DesktopVMs
You’re doing a great job of turning a hobby into something that only evil people do. How dare people copy something that is available freely to the public? Do they not see that having this hobby is a waste of time and is actively hurting people? Can’t they think of the lives they are ruining alongside their own?
A SysAdmin
Lisa, You do realize that the people that you are taking negatively about here are the same people that tell the CTO CIO CEO that they want to use sophos in there office? What do you think you are achieving by taking down to a tech community? You may also find that you dont understand everything in this world and maybe asking a response like a normal journalist is a good idea. AV was actually up for renewal and its between you and Trend Micro, who do you think i am going to pick?
Paul Ducklin
Seems very spiteful (and woefully unobjective) to advise your CEO against a vendor’s product because the vendor offered advice to Instagram users to review their privacy settings in order to protect against data leeches.
Presumably, when you advise your board members against Sophos’s products you won’t actually give the real reason you don’t like us – your CEO might be a bit perplexed to find that you like the idea of keeping giant collections of other people’s stuff “because you can”. After all, most companies are fairly keen on keeping control over their data and on avoiding having it sucked off their servers “for the lulz”.
Mahhn
(15+ years ago) I used to archive movies, TV series, training videos, then the files/media became more and more widely available online, so I stopped being a data hoarder and deleted my burdens,,, figuring in the end it’s all dust in the wind and I have better things to do with what little time I have…..
Timothy Johnson
If you want your images private, don’t post them on the internet for the world to see. It’s really that simple, Thinking these sites are secure are like locking up a bar of gold behind a wooden door with a skeleton key.