close
close

first Drop

Com TW NOw News 2024

Assume a breach when building AI apps
news

Assume a breach when building AI apps

COMMENTARY

If you’re still skeptical about artificial intelligence (AI), you won’t be for long. I recently used Claude.ai to model security data I had on hand into an attack path analysis graph. While I can do it myself, Claude did the job in a matter of minutes. More importantly, Claude was just as quick to adapt the script when significant changes were made to the original requirements. Instead of having to switch between security researcher and data engineer — exploring the graph, identifying a missing property or relationship, and adapting the script — I could keep my research hat on while Claude played engineer.

These are moments of clarity, when you realize that your toolbox has been upgraded, saving you hours or days of work. It seems that many people have had these moments, and have become increasingly convinced of the impact that AI is going to have in the enterprise.

But AI isn’t infallible. There are a number of public examples of AI jailbreaking, where the generative AI model is given carefully crafted prompts to do or say unintended things. It can mean bypassing built-in security features and guardrails, or gaining access to capabilities that should be restricted. AI companies are trying to solve jailbreaking; some say they’ve already done so, or are making significant progress. Jailbreaking is treated as a solvable problem — a quirk that will soon be gone.

As part of that mindset, AI vendors are treating jailbreaks like vulnerabilities. They expect researchers to submit their latest prompts to a bug bounty program rather than posting them on social media for fun. Some security leaders are talking about AI jailbreaks in terms of responsible disclosurewhich is a clear contrast to the supposedly irresponsible people who make their jailbreaks public.

Reality sees things differently

Meanwhile, AI jailbreaking communities are popping up on social media and community platforms like Discord and Reddit like mushrooms after the rain. These communities are more like gaming speedrunners than security researchers. Whenever a new generative AI model is released, these communities race to see who can find a jailbreak first. It usually takes minutes, and they never fail. These communities know nothing about or care about responsible disclosure.

To quote an X-message from Pliny the Promptera popular social media account from the AI ​​community: “Bypassing AI ‘safeguards’ is getting easier as they get more powerful, not harder. This may seem counterintuitive, but it’s all about the attack surface, which seems to be expanding much faster than anyone on the defensive can keep up with.”

Let’s imagine for a moment that vulnerability disclosure could work — that we could get everyone on the planet to submit their malicious prompts to a National Vulnerability Database-like repository before sharing them with their friends. Would that really help? Last year at DEF CON, the AI ​​Village hosted the largest public AI red-teaming event, where they Reportedly collected over 17,000 jailbreak conversations. This was an incredible effort with enormous benefits to our understanding of securing AI, but it did not significantly change the rate at which AI jailbreaks are discovered.

Vulnerabilities are idiosyncrasies of the application in which they are found. If the application is complex, there is more room for vulnerabilities. AI captures human language so well, but can we really hope to enumerate all the idiosyncrasies of the human experience?

Don’t worry about jailbreaks anymore

We should assume that AI jailbreaks are trivial. Don’t give your AI application capabilities it shouldn’t use. If your AI application can perform actions and relies on humans who don’t know about those prompts as a defense mechanism, expect those actions to eventually be exploited by a persistent user.

AI startups suggest that we think of AI agents as employees who know a lot of facts, but need guidance on how to apply that knowledge in the real world. As security professionals, I think we need a different analogy: I suggest that you think of an AI agent as an expert you want to hire, even though that expert ripped off their previous employer. You really need that employee, so you put some measures in place to ensure that employee doesn’t rip you off too. But ultimately, any data and access you give this problematic employee exposes your organization and is risky. Instead of trying to create systems that can’t be jailbroken, let’s focus on creating applications that are easy to monitor when they inevitably are, so we can respond quickly and mitigate the impact.