Control #
A
2
.
2
Implement input content moderation and model hardening
Deploy real-time input filters and model-level safeguards to block prompts likely to elicit harmful, restricted, or abusive outputs. This includes using classifiers, pattern detectors, and tuning the model to recognize and refuse adversarial prompts.
Evidence
Logs of flagged AI-generated outputs and enforcement actions
Evidence of implemented real-time content moderation system for AI inputs
Evidence of implemented real-time content moderation system for AI outputs
Recommended actions
We'll recommend specific practices and actions for complying with this control.