Is scraping files from a Freedom of Information website ‘hacking’?

Lisa Vaas

7 years ago

Let’s say a site devoted to letting people download files has a URL that contains a bunch of numbers. What happens if you go into the URL window of your browser and bump that number up by 1?
Well, in this case, you get yet another downloadable file, and so maybe you bump the number again to see if you get another file. Say you do, and you keep increasing that number by one to get even more files. Make that a lot more files, as in, 7,000, achieved with automatic scraping of the site.
And then, surprise surprise, your younger brother is arrested as he walks to school; your home is raided; your family is corralled in the living room; your sister starts to cry; and law enforcement agents dump out drawers, turn over mattresses, and seize everybody’s laptops and mobile phones (meaning that your dad can’t work).
Oh, and of course, you’re now facing a criminal charge for being a “hacker.” For downloading files from Nova Scotia’s freedom-of-information (FOI) portal.
CBC News reports that this is what happened to a 19-year-old in Halifax on 11 April.
His name hasn’t been released because he hasn’t yet been arraigned. Also, his family requested anonymity. The young man says he’s worried that a conviction could skewer his chances of getting hired. He hopes the charges will be dropped. CBC News quoted him:

I don’t know if I’ll be able to get a job if this gets on my record… I don’t know what my future will be like.

The government says he’s a hacker. There isn’t supposed to be that much freedom in the freedom-of-information portal, so it’s charging him with unauthorized computer access.
The “hacker” – or non-maliciously curious archivist, depending on how credible you find the teen vs. government prosecutors – downloaded about 7,000 freedom-of-information releases, the majority of which were already scrubbed of personal information and had been made publicly available.
About 250 of the records – around 4% – were prepared for Nova Scotians requesting their own government files. The files were un-redacted, contained highly sensitive personal information such as birth dates, addresses and social insurance numbers, and hence weren’t intended for public release.
Nor were they password-protected. They were just there for the taking for anybody who likes to save stuff. And this young man is definitely one of those online archivist types, of which there are many.
Archivists don’t always care if they’re downloading material that’s been posted publicly or that’s been stolen from locked accounts. For example, in September, we heard about redditors trying to rip every single image from Instagram. Why? Because they could.
But the Halifax man says he wasn’t that type of archivist. He thought the records were all public, he told news outlets, and he didn’t download them out of malice.

I didn’t do anything to try to hide myself. I didn’t think any of this would be wrong if it’s all public information. Since it was public, I thought it was free to just download, to save.

Does that make it OK? Twitter users so far have been pretty vocal in the teen’s defense. Likewise for privacy and security advocates who’ve talked to news outlets.
Evan D’Entremont, a software engineer, told CBC News that as more details emerge, it’s looking more and more like “this kid’s being railroaded.”

He didn’t actually do anything wrong, and the government’s looking for somebody to blame in this.

(For technical details about the portal and what the teen did, check out this post from D’Entremont.)
Others, calling the case a “travesty,” have started crowdfunding the teen’s legal defense. He’s facing up to 10 years in prison if convicted.
At Naked Security, there’s a bit of skepticism about the archivist’s claimed ignorance about scraping private information. The thinking: he’s done this before. In the past, his archivist inclinations have led him to amass data that include what’s typically the quickly submerged pages of sites such as 4chan and Reddit. He knows he was using the same loophole to get the Freedom of Information files.
In this case, he says he was curious to get to the bottom of a labor dispute about teachers. He didn’t find what he was after, so he wrote a simple one-line piece of code to automatically, sequentially increment the URLs and download the files. A few hours later, he had his 7,000 records.
If he’d quickly examined those files, he might have realized he was treading on other people’s privacy. Or then again, maybe not. According to what’s been reported, he would have had a 4% chance of hitting on one of those 250 out of 7,000 records that held private information.
The Electronic Frontier Foundation (EFF) has called the prosecution “ginned up.” The FOI portal apparently hasn’t put up “minimal technical safeguards” to keep out widely known indexing tools such as Google search and the Internet Archive from archiving all the records published on the site. The FOI portal took the system down, but D’Entremont has found several requests that Google indexed and cached. From his post:

This system is literally designed for facilitating “access to information.” …There are no authentication mechanisms, no password protection, no access restrictions. It’s very clear that the software is intended to serve as a public repository of documents.

The case is being compared to that of Aaron Swartz, an American who downloaded millions of journals from a server at MIT and whose prosecution was widely seen as prosecutorial overreach.

Readers, what’s your take on who’s to blame: the teen or the government?
Should the young man have put a bit more effort into ensuring he wasn’t asking for things he shouldn’t have asked for? Should the government be blamed for not redacting, or password-protecting, records published on a portal designed to let the public get at them? Is this the same as arguing that leaving your window open doesn’t make it OK for somebody to reach in and snatch your TV? Or is it different? Everybody knows you’re not supposed to walk into somebody’s private residence, even if the door’s unlocked. Is it criminal to download files that are supposed to be public?
The calendar pages are quickly flipping toward 25 May: the date when the European Union’s General Data Protection Regulation (GDPR) privacy law goes into effect. It’s leading companies to put quite a bit of effort into being careful about what kind of data they ask for, what they take and what they keep.
Should we all be held to that standard? Or should we expect that a portal made to provide access to public files is only going to provide files meant to be public?