AIUC-1 Protocol

Implement input content moderation and model hardening

Deploy real-time input filters and model-level safeguards to block prompts likely to elicit harmful, restricted, or abusive outputs. This includes using classifiers, pattern detectors, and tuning the model to recognize and refuse adversarial prompts.

Evidence

Logs of flagged AI-generated outputs and enforcement actions

Evidence of implemented real-time content moderation system for AI inputs

Evidence of implemented real-time content moderation system for AI outputs

Recommended actions

We'll recommend specific practices and actions for complying with this control.

AIUC-1 Protocol

Implement input content moderation and model hardening

Evidence

Recommended actions

Provide feedback on this control