Chatbots and artificial intelligence seem to be the buzzwords in the news and on the internet these days. With the launch of ChatGPT last year, and millions of users on the platform, the big question right now is: Are they safe?
Experts say that setting up a chatbot now is alarmingly easy to do. Hundreds of open-domain chatbot models are readily available for download. While some of them are tagged with documentation, many of them provide no information about where they came from or how they were trained.
State-of-the-art chatbot models learn and grow by gobbling up hundreds of billions of words and conversations contained in public data sets scraped from the internet.
The internet isn’t always a nice place, said Bimal Viswanth, a Commonwealth Cyber initiative researcher and assistance professor of computer science at Virginia Tech. Any toxicity in the training data can cause a chatbot to go off the leash in the middle of a conversation, spewing language that can be not only repugnant and hurtful, but also dangerous.
Researchers are working to tame the violent, racist, sexist language that has been reported from such chatbots.
Chatbots keep pace in a digital conversation, responding freely in natural language like a human. They can help the user by explaining concepts, retrieving facts, and providing context.
Viswanath received a $600,000 Secure and Trustworthy Cyberspace Award from the National Science Foundation. Working with Daphne Yao, also a CCI researcher, Viswanath is establishing automatic approaches to measure and mitigate toxicity in chatbot models. Their work will include the first large-scale measurement study of unintentional toxicity, the creation of AI models to probe for intentional toxic behavior, and the hopeful creation of an ever-evolving toxic language identifier and filter.
“Chatbots are incredibly exciting and useful — the potential applications keep expanding,” Viswanath said. “And we’re working to make sure that they are also safe to use.
“Everyone is fascinated by the recent artificial intelligence advances like ChatGPT,” said Viswanath. “But this technology is still new, and people should also understand what can go wrong with these things.
“People are using this technology for mental health reasons, or they might be letting their kids interact with it,” Viswanath said. “The bots are being widely deployed before we’ve developed security measures or even fully understand how they are vulnerable.”
The Virginia Tech researchers are sprinting to develop automatic methods to measure and classify a chatbot’s toxicity, take steps to correct the behavior, and implement a training regime to protect future chatbot models from corruption.
How does a chatbot model pick up toxicity? If a data set is 5 percent toxic, would that rate transfer to the bot? The researchers are exploring these lines of questions by conducting the first large-scale measurement study of unintentional toxicity in chatbot pipelines to identify rates and types of toxicity as well as input patterns that elicit harmful dialog.
In a poisoning attack, toxic language is purposefully introduced when training is outsourced to a third party or when the chatbot is periodically trained after deployment on recent conversations with its users. This results in a chatbot that produces a toxic response for a certain fraction of all queries. Through a more advanced poisoning attack, one can control when the toxic language is triggered. This is through a backdoor attack, where an attacker strategically injects toxicity so that the bot will go toxic only if certain topics are broached.
“A malicious attack like this could be driven by ideology or as a means to manipulate or control certain populations,” Viswanath said. “That’s what makes this frightening. The bots may be benign until triggered. Then it gets ugly.”
The researchers are working on curated data sets to establish safety benchmarks and train chatbot models. By adapting and applying the AI framework, they are also investigating ways to clean up data sets and eventually provide an attack-resilient training pipeline for chatbots.