Skip to content
Naked Security Naked Security

Facebook posts reveal your hidden illnesses, say researchers

The language we use could be indicators of disease and, with patient consent, could be monitored just like physical symptoms.

Does your stomach hurt? Do you tell your friends on Facebook?

If so, researchers suggest there’s a possibility you might be suffering from depression, and there’s a good chance that you could be diagnosed months earlier if they were to analyze your social media posts than if they just went by clinical diagnosis alone.

In a study from Penn Medicine and Stony Brook University that was published in PLOS ONE, researchers claim that they can diagnose someone based on their social media posts, given that the language people use can point to conditions such as diabetes, anxiety, depression and psychosis.

In their paper, the researchers described using natural language processing to analyze 949,530 Facebook posts made by 999 study participants, for a total of 20,248,122 words.

They looked for markers of 21 medical conditions, and they found that all of them were predictable from Facebook language beyond mere lucky guesses. Some of those medical conditions were particularly easy to predict, using a combination of demographics and Facebook language vs. just going by demographics alone: namely, diabetes, pregnancy, anxiety, psychoses, and depression.

One example of how language can strongly predict a diagnosis is alcohol abuse. Alcohol abuse was marked by use of the words “drink,” “drunk,” and “bottle,” they said. That’s a pretty intuitive diagnosis, but other predictions weren’t so obvious: for example, people who use the words “god,” “family” and “pray” are 15 times more likely to have been diagnosed with diabetes.

Other correlations:

  • Use of hostile language – e.g. “people,” “dumb,” “bulls**t,” “b**ches” – was a predominant marker associated with drug abuse as well as psychoses.
  • Those suffering from depression tend to use words associated with the physical symptoms of anxiety – “stomach,” “head,” “hurt” – and with emotional distress – “pain,” “crying,” “tears.”

Should you offer insulin to somebody who mentions praying and God? No, the researchers say: clearly, not everyone mentioning the words they tracked has a particular medical condition. Rather, those mentioning key words are more likely to have a given, correlated condition, they said.

No, your doctor won’t be e-stalking you

The researchers say that a helpful thing about social media is that it’s a two-way communication channel: it gives clinicians a built-in way to talk with patients. That doesn’t mean that they’ll be eavesdropping on your posts all the time, but given their research, they think it would make for effective models to treat patients who opt-in to a system of patients allowing clinicians to analyze their social media writings.

At any rate, Facebook is already eavesdropping, at least with regards to detection of suicidal thoughts. In September, the platform explained how, in the previous year, it had started to use machine learning to look for such thoughts in users’ posts.

Facebook’s post about the AI use, written by Catherine Card, Director of Product Management, is an interesting read, as it spells out the difficulties of teaching a machine linguistic nuance. For example, how do you give AI enough contextual understanding to glean that “I have so much homework I want to kill myself” isn’t a genuine cry of distress?

Facebook made a breakthrough when it realized that it could use false alarms as a training set. It had such a collection: in 2015, it introduced new ways for users to flag their friends’ suicidal notes. The posts were reviewed by humans – trained Community Operations reviewers – to determine if the writer were actually at risk of committing self-harm. Whatever posts the humans found had been incorrectly flagged as suicidal gave Facebook more data with which to more precisely train the classifiers used to determine accurate suicidal expressions.

But the Penn researchers aren’t advocating for an expansion of Facebook as an AI Big Brother that scans all our posts with or without our say-so. Rather, their work shows that an opt-in system for patients who agree to having their social media posts analyzed could provide extra information for their healthcare teams to use in refining their medical care.

Lead author Raina Merchant, the director of Penn Medicine’s Center for Digital Health and an associate professor of Emergency Medicine, told Science Daily that her team’s recent work builds on a previous study that showed that analysis of Facebook posts could predict a diagnosis of depression up to three months earlier than a clinical diagnosis. She said that it’s tough to predict how widespread an opt-in social media post analysis system would be, but that it could be useful for patients who are frequent social media users:

For instance, if someone is trying to lose weight and needs help understanding their food choices and exercise regimens, having a healthcare provider review their social media record might give them more insight into their usual patterns in order to help improve them.

Ever mention donuts in your posts? One imagines that information could come in handy.

Similar to how Facebook now allows users to flag posts within their network that they think may suggest suicidal ideation, the researchers suggest that clinicians could get early warnings about a broader set of conditions, they said:

A patient-centered approach [similar to Facebook’s suicide filters] could be applied to a broader set of conditions allowing individuals and their networks (for those who opt-in) to have early insights about their health-related digital footprints.

Privacy, informed consent, and data ownership

If the researchers are correct in claiming that you can make a diagnosis from public social media posts, then this is a great illustration of how much information people are sharing without being aware of it. The researchers make that exact point, in fact, pointing to the questions about privacy, informed consent, and data ownership that their work raises.

The extra ease with which social media access can be obtained creates extra obligations to ensure that consent for this kind of use is understood and intended. Efforts are needed to ensure users are informed about how their data can be used, and how they can recall such data. At the same time, such privacy concerns should be understood in the context of existing health privacy risks. It is doubtful that social media users fully understand the extent to which their health is already revealed through activities captured digitally.

The issue is that people don’t always understand that the whole is greater than the parts. We all might think we’re sharing little snippets that don’t amount to anything particularly revealing, but when we think that way, we miss the fact that a million little snippets add up to a very Big Data picture.

But we should also bear in mind that the more data you have, the more spurious correlations it will contain. As the researchers said, just because you use a given set of words doesn’t mean that you’re alcoholic/diabetic/depressive/pregnant/a drug abuser.

Sometimes, a cigar is just a cigar.

7 Comments

It would be interesting to see what is made from readers comments in news articles like Naked Security and others. Just for the lolz of course :)
With the mahoosive dataset that FB is… Let’s hope it can save some lives.
What do typo’s say about a person? “the whole is greater than the parts” perhaps should be “sum of its parts” if you are quoting Aristotle :D
And you’ll notice my use of smileys to fool the AI readers and keep us all happy at no cost to you!

I’m kinda shocked that Naked Security would suggest that it would be a good idea to let physicians snoop on your social media. Physicians report, for various reasons, to employers, insurance companies, large hospitals, the Feds and the state (Medicare; Medicaid; restricted medications, etc.). Both private and federal insurance already try to snoop on your social media if you need to apply for disability insurance payments; they are trying to find a reason to deny payments (or, in kinder language, they are looking for fraud). The idea that physicians are purely interested in your health, given all the various economic and legal directions they are pulled in, is wishful thinking. Plus, your medical data goes everywhere. Even if your doctor tries to work in your best interest, she or he has little control over all the agencies and companies that have access to this data. The last thing I would do is voluntarily open that door wider!

Naked Security isn’t suggesting it’s a good idea, we’re just reporting that, according to some recent research, it’s possible.

“949,530 Facebook posts […] for a total of 20,248,122 words. Each post contained at least 500 words.”

Shouldn’t 1million posts times at least 500 words per post equal to more than 500 million words instead of just over 20 million words?

The wording in the paper is ambiguous, and is most easily interpreted as 500 words per post. However, you’re obviously right, it doesn’t add up, so it can’t mean that. I think it means the participants had to have written at least 500 words in total, across all their status updates. I’ve removed the offending sentence because it’s perfectly understandable without.

Yes, that makes more sense. Then you get an average facebook-post of 20 words which makes more sense than facebooks posts of 500 words.

Comments are closed.

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?