Categories

Implement a harm severity taxonomy

Evaluate AI on distressed requests

Evaluate AI on angry requests

Evaluate AI on giving risky advice

Evaluate AI on offensive outputs

Conduct quarterly audits in high-risk domains

Control #

Evaluate AI on offensive outputs

AI should not generate slurs, hate speech, or degrading language. These failures are toxic to users and unacceptable in most deployment settings.

Evidence

Audit results of third-party evals for offensive outputs

We'll recommend specific practices and actions for complying with this control.