As concerns mount over DNA privacy, a group of DNA collection and genealogy websites has released a set of best practice guidelines for handling sensitive genetic and family data. Will it give consumers much more protection though? Probably not.
23andMe, Ancestry, Helix, MyHeritage, and Habit worked with the Future of Privacy Forum to release the guidelines, which explain how to handle information about a family’s genetic makeup. Sites like 23AndMe offer genetic tests to consumers who send in a simple saliva swab. They can then use this to tell you about your ancestry and to let you know about genetic health risks.
The guidelines apply to any data about an individual’s inherited genetic characteristics. This includes three types: Data that comes directly from sequencing a person’s DNA, data that a company can create by analyzing that raw data (such as particular gene information or data about physical characteristics) and finally data that a person reports about their own health conditions.
The document broadly replicates many of the rules laid down by the EU’s General Data Protection Regulations (GDPR), which any company holding data on EU residents is already beholden to. It also draws on other guidance, including the Health Insurance Portability and Accountability Act (HIPAA), the Genetic Information Nondiscrimination Act and the Americans with Disabilities Act.
It includes statements on accountability (companies should release reports on what they’re doing with peoples’ data) and privacy by design (implementing technical controls to support the other rules) among others. It also says:
Genetic Data, by definition linked to an identifiable person, should not be disclosed or made accessible to third parties, in particular, employers, insurance companies, educational institutions, or government agencies, except as required by law or with the separate express consent of the person concerned.
This document still leaves some privacy concerns. Let’s start with the timing of its release.
The companies have released the guidelines because genetic data is so sensitive, they say. It can be used to predict future medical conditions, reveal information about someone’s family members, or have cultural significance for groups of individuals.
A couple of days before the guidelines dropped, 23AndMe announced a deal with GlaxoSmithKline, effectively selling a heap of client data for a $300m investment.
Under the deal, the pharma giant gets access to de-identified data for research purposes. That is, data that doesn’t allow information to be “reasonably be associated with an individual”.
The guidelines released this week explain that none of the best practices apply to this de-identified data.
Deidentified information is not subject to the restrictions in this policy, provided that the deidentification measures taken establish strong assurance that the data is not identifiable.
Is a “strong assurance” enough to protect you?
In some cases, researchers can re-identify data. Consider this project, from 2013, in which a Harvard professor re-identified over 40% of the people in a high-profile DNA study. The guidelines recommend aggregating data before de-identifying it to make the protections strong enough.
That fact is that anonymising or de-identifying data in the era of Big Data is hard.
You have to complete a consent form for 23AndMe to use your data as part of its research programme, which would include giving it to Glaxo, but even if you don’t sign that form, it may still be given to other people. 23AndMe already has a privacy policy in place, which explicitly says that if you don’t consent to research, it can still share your genetic information with third party service providers.
The guidelines mirror this policy, requiring express consent for:
Onward transfer of individual level information (i.e., Genetic Data and/or personal information about a single individual) to third parties for any reason, excluding vendors and service providers.
That’s a pretty big exclusion. What are vendors and service providers, just for the record? From the guidelines:
Vendors and service providers are companies that act under the direct authority of the data controller or processor and are authorized to process personal data in support of providing the data controller’s commercial product or service.
What kinds of company might that include? The documents don’t specify.
How governments use DNA data
Those providing their DNA and/or ancestry data to companies may also have other privacy concerns based on law enforcement’s use of that data. The best practice guide allows DNA and genealogy sites to give data to law enforcement when they ask for it.
Perhaps they’re worried about the spate of recent stories highlighting just how sensitive this data is. The most famous case is the Golden State Killer, a serial killer, rapist and burglar who was active from 1974 until 1986. Law enforcement used DNA evidence to help unmask Joseph James DeAngelo Jr as the main suspect.
Detectives submitted DNA from a 1978 crime scene to GedMatch, a website that lets people upload their genetic profile from commercial DNA companies, and also their GEDCOM file, which is a standard file format used to hold genealogical data. They used this information to match the crime scene DNA with information provided by a relative of DeAngelo’s.
This technique has also been used to find identity thieves, and other murderers and sex criminals.
The use of DNA is raising concerns about privacy. On one hand, everyone wants to see killers and rapists jailed. On the other hand, people worry about misuse of the technology. Even GedMatch warned after the DeAngelo incident that people should understand the risks involved with submitting their personal genetic and genealogical data.
Police officers have in the past forced companies to hand over genetic data as part of investigations, and California has a law that allows the state to collect DNA from any child or adult convicted of a felony or any adult arrested for a felony.
Governments are using the data for other purposes, too. Canada’s border agency has been found using DNA testing and ancestry web sites to investigate immigrants.
What does all this mean for people considering using these sites? The choice to participate in these services is always in the hands of the individual, but it should be an informed choice.
As always consider how much of your own data that you want to expose and weigh the potential privacy risks against the benefits (in this case, finding out more about your health and history).
This means not just checking out the best practice document but wading through the language in the service providers’ own privacy policy to be sure that you understand what they mean. These are commercial sites, and in many cases could be making their use of the data far clearer.
If you do decide to avail yourself of these services, make sure that you adjust the privacy settings in your account to reflect your wishes, rather than simply trusting that the vendor has your best interests at heart.
Who Am I?
Identifying people from identifying data (DNA, names, addresses, etc.) is trivial and identifying people from “anonymized” data is not always much harder. The state of Massachusetts’ GIC, AOL, Netflix, and NYC taxis have made anonymized data public that people then used to identify people in those data sets. You do not need too much data to identify someone. In the US, you can identify about 87% of the general public with just the DOB/age, sex, and zip code (search for “de-identifying 87% data with dob sex zip code” for more info). This should be similar in other countries using the equivalent of the US zip code.
Buyer beware: The data you give someone can (and sometimes will) be used against you.
Bentham
“As always consider how much of your own data that you want to expose”
But of course it is not just “your own data”, it is also rather close to being your relatives’ data.
At what stage do you have to consider obtaining consent from others? If you have an identical twin, should you ask? What about non-identical twins (where your DNA will none the less be very similar)? What about a sibling (with whom you share mtDNA – or for that matter anyone descended through a female line from your direct maternal line going back until there is some mutation)? Or others with whom you may share a Y chromosome (basically if you are male, anyone descended through a male line from your paternal line going back until there is some mutation)