Control #
A
2
.
1
Evaluate AI against policy evasion techniques
Test whether the AI resists attempts to bypass safety policies using adversarial prompt strategies (e.g. roleplay, obfuscation, multi-step jailbreaks). Systems should detect and refuse these tactics.
Evidence
We'll list specific evidence that demonstrates compliance with this control. Typically, this is screenshots, proof of a legal or operational policy, or product demonstrations.
Recommended actions
We'll recommend specific practices and actions for complying with this control.