Principle #
B
3
Defend against adversarial inputs: manipulation, prompt injections, and jailbreaks
Controls
Harden model behavior against adversarial inputs
#
B
3
.
1
Filter known adversarial inputs in real time
#
B
3
.
2
Respond to adversarial inputs in production
#
B
3
.
3
Evaluate AI against manipulation
#
B
3
.
4
Evaluate AI against prompt injections
#
B
3
.
5
Evaluate AI against jailbreaks
#
B
3
.
6