HealthCare.gov, the US federal health insurance exchange website, is inadvertently sending users’ personal health information to fourteen separate third party websites.
The site, a central component of The Affordable Care Act (often referred to as Obamacare) leaks data via referer headers.
According to reports from the Associated Press and the Electronic Frontier Foundation (EFF), the data being sent to third-party websites includes zip code, age, income, and whether or not you’re pregnant or a smoker.
The health data is being leaked in referer headers because, rather unwisely, HealthCare.gov includes that health data in its own URLs.
It works like this:
When browsers request a web page, the request includes the URL of the page that the request was referred from in its referer [sic] header (the misspelling is enshrined in the offical HTTP specification).
If you went from this page to our Cookies and Scripts page, the request would look something like this:
No problem so far, that’s how HTTP is supposed to work and the referer header is even occasionally useful.
However, the page you think you’re going to isn’t the only page that gets referral data.
If the web page you’re on contains third party code like a Twitter widget then your browser has to get that code from the third party website, and that request has a referer header too.
So, via the referer header, your browser is telling Twitter what web page you’re on when it asks for an embedded Twitter widget (and the same is true of any other third party code).
Whether you know it or not, and whether they’re listening or not, you’re sharing which bit of the web you’re on at that moment with all the third party code used to build up a web page.
Healthcare.gov
On the Obamacare site you share a whole lot more than just where you are though.
HealthCare.gov includes sensitive information about the person using the site in its URLs as a way of passing information from one page to another.
As mentioned before, the URLs contain information such as your age, zip code and income and whether or not you’re a smoker or pregnant (the URL is rather long but you can scroll the box left and right to see it in all its glory).
https://www.healthcare.gov/see-plans/85601/results/?age=40&smoker=1&parent=&pregnant=1&mec=&zip=85601&state=AZ&income=35000&
If you logged in to HealthCare.gov and visited the URL above you’d dispatch that whole URL, including all the personal data within it, to any sites providing third party code for that page.
In the case of HealthCare.gov that’s fourteen different websites, some which are websites belonging to advertising companies who specialise in user profiling.
The URL doesn’t contain anything that names or identifies a specific individual but that doesn’t make it safe – it’s alarmingly easy to identify individuals from scant, anonymized data.
The government explicitly prohibits the companies from using the data in the referer and there is no suggestion that any of them are actually using the leaked data for their profiling (and, given the stakes, I suspect it’s unlikely they’d even consider it) but that doesn’t make it OK.
Most websites store logs of which pages have been visited and those logs often include the referring URL. Some companies backup that data for years and since nobody will be expecting it to contain health information it’s unlikely to be treated with the level of sensitivity required.
Which means that even if all 14 sites are operating with impeccable ethics, there could still be 14 separate, accidental copies of that leaked data hanging around for a long time with less than ideal protection applied.
As the EFF’s Cooper Quentin noted in a blog post, private health data should not be shared in this way, and it’s a “massive violation of privacy”:
People's private medical data should not be available to third party companies without consent from the user. This practice is negligent at best.
The sharing is accidental but it comes as a result of poor choices in the design of HealthCare.gov rather than HTTP itself.
Our privacy shouldn’t depend on the ethics of companies with a conflict of interest and we shouldn’t be in the business of trying to predict how somebody might be able to access our leaked data in future (or what they might cross reference that data with).
The principle of least privilege demands that the data shouldn’t have been there in the in the first place.
URLs are not, and shouldn’t contain, sensitive information. They get found, indexed, spread around and stored through all sorts of different mechanisms including server logs, bookmarks, browser histories, search engine indexes and server status pages.
To avoid leakage, sensitive data should only be sent in the request body of an HTTP POST request and received in the message body of the response (via HTTPS of course).
End users can control referer headers and protect themselves from poorly designed, leaky sites with a range of plugins and configuration controls (too many to list here I’m afraid).
That’s only half the story though – third party code can greatly enhance the functionality of a website but it enjoys very privileged access to any pages its included on and gobbling up referer headers is just the tip of the iceberg.
Rather than focusing on what to do about referers specifically, it’s proabably better to use plugins like NoScript, Ghostery or the EFF’s own Privacy Badger to control which third party sites you want to share anything with.