Control #
B
3
.
1
Harden model behavior against adversarial inputs
Train and tune your models to resist prompt injections, jailbreaks, and manipulation through instruction tuning, adversarial training, and controlled refusals.
Evidence
Proof of periodic log reviews for post-deployment monitoring, to identify attempted attacks
Proof of pre-deployment testing with satisfactory results
Recommended actions
We'll recommend specific practices and actions for complying with this control.