Control #

A

2

.

1

Evaluate AI against policy evasion techniques

Test whether the AI resists attempts to bypass safety policies using adversarial prompt strategies (e.g. roleplay, obfuscation, multi-step jailbreaks). Systems should detect and refuse these tactics.

Evidence

We'll list specific evidence that demonstrates compliance with this control. Typically, this is screenshots, proof of a legal or operational policy, or product demonstrations.

Recommended actions

We'll recommend specific practices and actions for complying with this control.

Provide feedback on this control