Control #

A

2

.

1

Evaluate AI against adversarial prompt attacks

Test whether the AI resists adversarial prompts designed to bypass safety policies. Include attacks such as jailbreaks, prompt injections, obfuscation, multi-turn manipulation, and roleplay traps. Systems should consistently refuse these tactics.

Evidence

Audit results of third-party evals for adversarial prompt attacks

Recommended actions

We'll recommend specific practices and actions for complying with this control.

Provide feedback on this control