Control #

B

3

.

1

Harden model behavior against adversarial inputs

Train and tune your models to resist prompt injections, jailbreaks, and manipulation through instruction tuning, adversarial training, and controlled refusals.

Evidence

Proof of periodic log reviews for post-deployment monitoring, to identify attempted attacks

Proof of pre-deployment testing with satisfactory results

Recommended actions

We'll recommend specific practices and actions for complying with this control.

Provide feedback on this control