Remember all the talk of AI-created video 'deepfakes' that so terrified us all? Well, there's a new deepfake AI around, one that's so powerful its creators - who are committed to open source - aren't even releasing it.Video Deepfakes Are Scary
A year or so ago, AI developers came up with what is now known as 'deepfakes' - AI programs that could take snippets of video to create new, artificial, footage of an individual. It wasn't just the thought of cybercriminals using this to create fake porn footage to harass and blackmail people that terrified us all, but also the possibility of someone making fake hate speeches of politicians, national leaders, and others.
This was considered so serious that even Reddit took action, while Google and other platforms also made it clear they wouldn't tolerate deepfake footage. But as with any other piece of tech, once it's out in the wild, you really can't do much.Text Deepfakes May Worsen The Fake News Problem
But there's another area where deepfakes can be scary - news articles. And with a new text-writing AI around, so much could go wrong - imagine someone setting up a fake news website that wouldn't even need humans to churn out misleading articles.
Or imagine a flood of fake Tweets, WhatsApp forwards, and Facebook posts that lead to consequences like civic unrest, stock market fluctuations, and even international incidents. After all, if the harder it gets to tell it's not been written by a human, the more likely it is that people can be fooled.You Really Can't Tell It's Written By An AI
So here's what the latest fuss is all about. Researchers from the OpenAI group (founded, amongst others, by Elon Musk - though he's no longer associated with them), which carries out research into 'safe' and useful AI tech that can be shared with the world, came up with a new text generator, that can take a few sentences as a writing prompt to come up with a proper, lengthy post or article. However, (and not too surprising, to be honest), the text generating AI was so good, it spooked the researchers, who have refused to release it to the world!Meet GPT-2, An AI-powered Natural Language Model
This Natural Language AI was trained on 40GB of data from across the Internet - utilising 1.5 billion parameters to analyse 8 million web pages! The dataset included outbound links from Reddit, and in order to maintain the quality of language the AI was to learn from, the researchers used just links approved by human moderators.
The result? It's scary. Here's an example of GPT-2 saying recycling is bad:
“...Recycling is bad for our economy. It increases the cost of a product, and in turn, the price of everything that is made with that product. Recycling is not good for our nation. We pay a tremendous price for the privilege of having the world's most advanced and efficient recycling system. Recycling is a huge, colossal waste of time, energy, money, and resources…”
Yes, we know it sounds like something Donald Trump would write, but you know there'd be thousands (if not millions), who would take this seriously.
What makes GPT-2 more powerful is that it's unsupervised - feed it a sentence and without any more guidance, it'll come up with an article! In fact, it's so eerily good at its job, it can even make up quotes! Here's an example of a story this AI wrote, on the topic of a train carrying nuclear materials going missing:
“...'The theft of this nuclear material will have significant negative consequences on public and environmental health, our workforce and the economy of our nation,' said Tom Hicks, the U.S. Energy Secretary, in a statement. 'Our top priority is to secure the theft and ensure it doesn't happen again.'...”
© techspotThe Researchers Are Worried About This Being Used For Fake News
Yes, the AI's creators are so worried they've refused to release their original work to the world. Instead, in a blog post, they make it clear that they'll only release a smaller sample:
“... Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper….”
They go on to mention misleading news items, impersonation, and the automation of abusive and spam/phishing content as the reasons why they feel the full version of GPT-2 shouldn't be publicly accessible
One can't blame them for taking this step - imagine someone using this to influence elections, or to provoke a dispute between two heads of state! But while advocates of open source tech are upset that OpenAI is not allowing full access to their new AI, it's unlikely this will close off new deepfake tech.
If the Internet has taught us anything, this tech will either get leaked, or someone will use the 'limited' (no training code, no weights the AI uses to devise text, and none of the dataset it learnt from) version of this new AI to write something just as powerful, and just as scary.