Sophos News

AI wordsmith too dangerous to be released… has been released

A text-generating artificial intelligence (AI) algorithm whose creators initially deemed too dangerous to release – given its ability to churn out fake news, spam and misinformation after feasting on a mere headline – has been unleashed.

So far, so good, says the research lab, OpenAI. In a blog post last week, the lab said that the researchers have seen “no strong evidence of misuse” of the machine-learning language model, which is called GPT-2… at least, not yet. From the post:

While we’ve seen some discussion around GPT-2’s potential to augment high-volume/low-yield operations like spam and phishing, we haven’t seen evidence of writing code, documentation, or instances of misuse […] We acknowledge that we cannot be aware of all threats, and that motivated actors can replicate language models without model release.

Exactly how convincing is the output? Well, you can try it out yourself at a web version on TalkToTransformer.com that lets you enter your own prompts.

On the “Whew – this thing’s pretty stupid! Our jobs as writers are SAFE!” side, we have what The Verge got out of GPT-2. The publication fed it six prompts that produced a varyingly jumbled and ludicrous set of outputs, including one that said that Gimli, the dwarf from Lord of the Rings, was a tall man who was also a dwarf, and one that concluded that Gregor Samsa had a terrible struggle with a vermin that was more like an ape.

On the more worrying side is what GPT-2 did when I fed it part of a Naked Security article about an IT services pro who hacked his former client’s email.

The prompt I fed it is in bold in this image, followed by what it came up with to complete the text:

An IT project manager has pleaded guilty to accessing the email account of a former client’s CEO, said reports this week.

The executive allegedly received an email containing inappropriate content from a business partner in 2011, with the executive’s wife receiving a text message from the executive, according to Reuters.

A source told Reuters that the IT manager, identified in the reports as ‘Y’, had access to the executive’s email account for around six years.

Y pleaded guilty to a misdemeanor charge of unauthorized access to a protected computer, according to Reuters.

The criminal act came to light after the former business partner filed a lawsuit against Y, claiming that he had sexually harassed her, according to the reports.

The former business partner, identified in the reports as ‘J’, told investigators that she first noticed inappropriate messages between the two of them in April 2011, while looking for a job.

“A source told Reuters”?! “Sexual harassment”?! An algorithm doesn’t have fleshy tubes from which to pull facts, but it surely was pulling them from somewhere outside of reality. Still, if somebody wasn’t familiar with the facts of the human-generated, bona fide news story, the text has no obvious give-aways that it’s full of misinformation, and you can see where the model could generate entire fake news stories after feeding on a mere headline.

A second test, with the text in bold, puffed out a convincing story of woe and called for donations to the Blackfoot Disaster Relief Fund:

17,000 people have been caught in a flash flood in ʻOjibwe’s Blackfoot Country on Thursday night.

The Ojibwe Nation’s chief and council are asking for donations to the Blackfoot Disaster Relief Fund to assist the people who have been displaced by the flooding.

More than 100 Ojibwe tribal members and more than 100 from other First Nations in Northern Manitoba were affected by the flooding…

OpenAI’s partners at Cornell University surveyed people in order to determine how convincing GPT-2 text is. It earned a “credibility score” as high as 6.91 out of 10.

Other third-party research found that extremist groups can use GPT-2 to create “synthetic propaganda” by fine-tuning GPT-2 models on four extremist ideologies. That hasn’t yet come to pass, OpenAI has found. Its own researchers have created automatic systems to spot GPT-2 output with ~95% accuracy, but the lab says that’s not good enough for standalone detection. Any system used to automatically spot fake text would need to be paired with “metadata-based approaches, human judgment, and public education.”

OpenAI first announced its “amazing breakthrough in language understanding” in February 2019, but it said that it would limit its full release, given its worry that “it may fall into the wrong hands.” We’ve seen a few examples of the “wrong hands” that AI has fallen into, in the form of deepfake revenge porn and scammers who deepfaked a CEO’s voice in order to talk an underling into a $243K transfer.

The decision to withhold the full model until last week stirred up controversy in the AI community, where OpenAI was criticized for stoking hysteria about AI and subverting the typical open nature of the research, in which code, data and models are widely shared and discussed.

The decision also led to OpenAI becoming the object of AI research jibes like these:

What do you think? Was releasing this tool a good idea or a bad one?