AI Researchers Got Chatbots to Share Cocaine Recipes Using This One Wild Trick
Decrypt·

60-second summary
Researchers have successfully tricked AI models into sharing sensitive information, including cocaine recipes, by exploiting a novel jailbreak technique that manipulates the models into treating attacker-written text as their own reasoning, bypassing safety guardrails and exposing a deeper security flaw, highlighting the urgent need for more robust AI security measures to prevent such vulnerabilities.
Researchers say a new jailbreak technique tricked AI models into treating attacker-written text as their own reasoning, bypassing safety guardrails and exposing a deeper security flaw.