Most AI chatbots easily tricked into giving dangerous responses, study finds

theguardian.comPublished: 5/21/2025

Summary

Hacked AI-powered chatbots threaten to make dangerous knowledge readily available by churning out illicit information the programs absorb during training, researchers say. The warning comes amid a disturbing trend for chatbots that have been “jailbroken” to circumvent their built-in safety controls. The restrictions are supposed to prevent the programs from providing harmful, biased or inappropriate responses to users’ questions. The security controls are designed to stop them using that information in their responses. Jailbreaking tends to use carefully crafted prompts to trick chatbots into generating responses that are normally prohibited.