close
close

first Drop

Com TW NOw News 2024

Expansion of our Model Safety Bug Bounty program
news

Expansion of our Model Safety Bug Bounty program

PRESS RELEASE

The rapid advancement of AI model capabilities requires equally rapid advancement of security protocols. As we work to develop the next generation of our AI security systems, we are expanding our bug bounty program to introduce a new initiative focused on finding flaws in the mitigations we use to prevent abuse of our models.

Bug bounty programs play a critical role in strengthening the security and safety of technological systems. Our new initiative focuses on identifying and mitigating universal jailbreak attacks, which are exploits that can make it possible to consistently bypass AI security features across a wide range of domains. By addressing universal jailbreaks, we aim to address some of the most important vulnerabilities in critical, high-risk domains, such as CBRN (chemical, biological, radiological and nuclear) and cybersecurity.

We are eager to collaborate with the global community of safety and security researchers and invite interested candidates to apply to our program and evaluate our novel security measures.

Our approach

So far we have been running an invite-only bug bounty program in partnership with HackerOne that rewards researchers for identifying model safety issues in our publicly released AI models. The bug bounty initiative we’re announcing today will test our next-generation system we’ve developed for AI safety mitigations, which we haven’t publicly implemented yet. Here’s how it will work:

  • Early Access: Participants will be given early access to test our latest security mitigation system before it is publicly deployed. As part of this, participants will be challenged to identify potential vulnerabilities or ways to circumvent our security measures in a controlled environment.

  • Program scope: We offer rewards up to $15,000 for new, universal jailbreak attacks that can expose vulnerabilities in critical, high-risk domains such as CBRN (chemical, biological, radiological and nuclear) and cybersecurity. Because we written In previous articles, a jailbreak attack in AI refers to a method used to bypass the built-in security measures and ethical guidelines of an AI system, allowing a user to elicit responses or behaviors from the AI ​​that would normally be restricted or prohibited. A universal jailbreak is a type of vulnerability in AI systems that allows a user to consistently bypass security measures across a wide range of topics. Identifying and mitigating universal jailbreaks is the primary focus of this bug bounty initiative. If exploited, these vulnerabilities can have far-reaching implications in a variety of malicious, unethical, or dangerous areas. The jailbreak is defined as universal if the model can answer a certain number of specific malicious questions. Detailed instructions and feedback are shared with program participants.

Join us

This model security bug bounty initiative will begin as an invitation-only effort in partnership with HackerOne. While it will be invitation-only to begin, we plan to expand this initiative more broadly in the future. This initial phase will allow us to refine our processes and respond to submissions in a timely and constructive manner. If you are an experienced AI security researcher or have demonstrated expertise in identifying jailbreaks in language models, we encourage you to request an invitation through our application form no later than Friday, August 16. We will contact the selected candidates in the autumn.

In the meantime, we are actively seeking reports of model security issues to continually improve our current systems. If you have identified a potential security issue in our current systems, please report it to (email protected) with sufficient detail so that we can reproduce the issue. For more information, please see our Responsible Disclosure Policy.

This initiative aligns with the commitments we have made to other AI companies to develop responsible AI, such as Voluntary AI commitments announced by the White House and the Code of Conduct for Organizations Developing Advanced AI Systems developed through the G7 Hiroshima Process. Our goal is to accelerate progress in limiting universal jailbreaking and strengthening AI security in high-risk areas. If you have expertise in this area, please join us in this critical work. Your contributions can play a significant role in ensuring that as AI capabilities grow, our security measures keep pace.