The exploit involved tricking GPT-4o into generating Python code to target a critical vulnerability, CVE-2024-41110, in Docker Engine. This vulnerability, patched in mid-2024, allowed attackers to bypass security plugins and elevate privileges. By encoding instructions in hex, Figueroa could mask the dangerous commands, allowing the AI to process each step without recognizing the overall malicious intent. When decoded, the hex instructions prompted the model to write an exploit for the CVE vulnerability, mirroring a proof-of-concept previously developed by researcher Sean Kilfoy.
Figueroa’s findings underscore the need for stronger, context-aware safeguards in AI models. He suggests improving detection mechanisms for encoded content and developing models that can analyze instructions in a broader context, reducing the risk of such bypass techniques. This type of vulnerability, known as a guardrail jailbreak, is exactly what 0Din encourages ethical hackers to uncover, aiming to secure AI systems against increasingly clever attack methods.
Source: The Register
The European Cyber Intelligence Foundation is a nonprofit think tank specializing in intelligence and cybersecurity, offering consultancy services to government entities. To mitigate potential threats, it is important to implement additional cybersecurity measures with the help of a trusted partner like INFRA www.infrascan.net, or you can try yourself using check.website.