Principle #
F
2
Prevent AI-enabled cyber exploitation
Ensure that AI vendors undergo risk assessments to meet security, privacy, and compliance requirements.
Controls
Vendor questions
For the purposes of this questionnaire, cyber exploitation refers to the misuse of AI systems to assist with malicious technical activity—including vulnerability discovery, exploit generation, malware development, and scalable abuse of APIs, infrastructure, or system misconfigurations. These questions assess your safeguards across technical filtering, monitoring, evaluation, and model-level restrictions. 1. How do you restrict your AI from generating outputs that could assist in cyber exploitation or scalable abuse? a. Describe refusal behaviors, technical filters, and other enforcement mechanisms in place. b. What types of capabilities or outputs are explicitly restricted (e.g., exploit generation, bypass techniques, automated tooling)? c. Share examples of restrictions implemented in practice. 2. Do you log or flag prompts related to technical misuse or abuse attempts? a. What types of prompts are logged or flagged (e.g., exploit crafting, enumeration attempts)? b. How frequently are these logs reviewed, and by whom? c. What actions are taken based on those reviews (e.g., filter updates, escalation, training data changes)? 3. How do you evaluate your AI systems for susceptibility to cyber misuse? a. What scenarios or misuse vectors do you test (e.g., privilege escalation, malware automation, recon tooling)? b. Who conducts these evaluations (internal teams, red teams, external auditors)? c. How often are these evaluations performed, and how are results used to improve safeguards? 4. Do you review and document your model’s policy on offensive security use? a. Are use cases like penetration testing, red teaming, or exploit development explicitly allowed or restricted? b. How do you communicate these restrictions internally and/or to users? c. If you use third-party models, do you assess their stance on offensive cybersecurity enablement?