Changelog

2025-05-22

This version cleans up and clarifies potentially overlapping evaluation controls.

Adversarial inputs

Controls around adversarial inputs and jailbreaks have been simplified, and previous controls have been replaced:

Evaluation against policy evasion techniques (previously A2.1)
Evaluation against manipulation (previously A2.5)
Evaluation against jailbreaks (previously B3.4)
Evaluation against prompt injections (previously B3.5)
Hardening against adversarial inputs (previously B3.1)
Filtering adversarial inputs in real time (previously B3.2)
Responding to adversarial inputs in production (previously B3.3)

A new larger principle around mitigating adversarial inputs (A2) under the Safety category now includes all types of adversarial prompts and technical controls to prevent them. Mitigating adversarial attacks (previously B3) has been combined and removed.

Deception and manipulation

A redesigned principle around deceptive AI (F1) clarifies that the focus is on outputs that deceive users around identity, authority, intent, or emotional state.

Cyber exploitation

A redesigned principle around cyber exploitation (F2) combines the 2 evaluations into one around cyber/technical misuse (F2.3) and a new evaluation for reviewing model policies is introduced (F2.4).

Catastrophic misuse/CBRN

A redesigned principle around catastrophic misuse (F3) removes evaluations controls and rebuilds focus around reviewing disclosed first-party and third-party model risks.

2025-05-08

Thanks to everyone for their latest round of feedback.

Efficacy

"AI incident response and disclosure" is now "Respond to AI failures" and covers both incident response plan practices (severity tiers, review of significant incidents, and customer disclosure of extra-significant incidents) and a requirement to share support practices with customers.

2025-05-07

Thanks to everyone for their latest round of feedback.

Overall

Each principle now includes a corresponding vendor questionnaire.

Safety

Adversarial inputs and adversarial attacks now have clearer mandates; a control for AI manipulation is moved from Security to Safety.

Security

Adversarial inputs and adversarial attacks now have clearer mandates; a control for AI manipulation is moved from Security to Safety.

Privacy

Governance

"Assessing AI vendors for compliance" has been rewritten. The new principle includes controls that apply high-level concerns from our framework to all AI vendors.
"Comply with AI and bias laws" is an updated version of "Comply with bias laws". The principle and its controls now cover upcoming AI legislation.

Efficacy

Society

2025-04-25

Thanks to everyone for their latest round of feedback. This latest version includes evidence for Safety and Security categories, and tightened audit language across the board.

2025-04-08

Thanks to everyone for their latest round of feedback.

Overall

Controls are more consistent across the board.
Evaluation controls have been renamed from "Test AI for XYZ" to "Evaluate AI for XYZ" for clarity.
We've tried to distinguish clearly across three types of related harm by output type: harmful outputs from benign inputs (harmful outputs); harmful outputs from adversarial inputs (adversarial inputs); and system-compromising outputs from adversarial attacks (adversarial attacks).

Safety

All principles now have revised controls.

Security

All principles now have revised controls.
"External systems" has been generally renamed to "tool calls" to align with industry terminology.
The principle around securing AI infrastructure has been dramatically simplified.

Privacy

All principles now have revised controls.

Governance

All principles now have revised controls.
The principle around assessing AI vendors for compliance has been dramatically simplified and now only includes the most crucial elements.
The principle around aligning with bias and discrimination laws has been simplified to remove industry-specific controls, and is instead centered around tracking compliance status and mapping them implementations.
The principle around designated geographies has been rewritten for clarity and renamed to "approved regions".

Efficacy

All principles now have revised controls.
AI incident response has been updated to include customer disclosure for serious incidents.

Society

All principles now have revised controls.
Two principles around serious model misalignment and technical reuse have been restructured into three new principles preventing deception, cyber exploitation, and catastrophic harm (CBRN).

Upcoming changes

Most controls need recommendations and evidence criteria. We also need to score recommendations based on effort.
Crosswalks to laws (various federal/state laws around AI and bias; EU AI Act, etc) and other standards (NIST AI RMF, SOC2) still need to be built or updated.
We want to include principles around IP risk.

2025-04-07

Thanks to everyone for their latest round of feedback.

In this version, all principles in the Safety, Security, Efficacy, and Society categories have received clearer names and new controls that are clearer and more relevant.

We're now down to no more than 5 principles per category and no more than 5 controls per principle.

Safety

"Mitigate against misuse." now includes evaluation controls.
"Mitigate harmful outputs." now includes evaluation controls, a new control around a harm severity taxonomy, and a new control around human-in-the-loop reviews.

Security

"Limit and monitor AI access to external systems." is now centered around limiting tool use by risk level and necessity.
"Protect access to AI systems, data, and model assets." is focused on access control for critical AI systems. Controls that overlapped with SOC II have been removed.
"Defend against adversarial inputs" now includes evaluation controls.

Privacy

Privacy principles and controls will be updated soon.

Governance

Governance principles and controls will be updated soon.

Efficacy

A new principle, "Keep your AI system within its intended scope.", has been added.
The principle around AI-related incidents was replaced with a new principle around notifying customers for serious AI incidents.

Society

"Safeguard your product against serious model misalignment." now includes evaluation controls.
"Prevent your AI systems from technical misuse against others." now includes evaluation controls.

Upcoming changes

Some Controls need context or recommendations/evidence. We also need to score each recommendation based on how much effort it will take to implement.
We still need to rethink how compliance and reporting fit together; ideally this is combined into a few Governance principles.
Principles around Bias are scattered and unorganized.
Crosswalks to laws (various federal/state laws around AI and bias; EU AI Act, etc) and other standards (NIST AI RMF, SOC2) still need to be built or updated.
We want to include principles around IP risk.

2025-04-02

Thanks to everyone for their latest round of feedback.

In this version, we've tightened the language across the board:

Promises are now Principles. They've been rewritten to the second-person perspective ("Secure your systems" instead of "We secure our systems").
Tasks are now Recommendations, to clarify that they are practices that we strongly recommend (but do not require; there are other ways to comply with Controls.)

Categories have shifted:

2 new categories, Efficacy and Society, has been introduced.
Bias has been removed; its principles have been folded into Governance and Efficacy.

Questions

Should we call the category Efficacy or Robustness?

Upcoming changes

Work continues on clarity; our goal is no more than 5 controls per principle and no more than 5 principles per category.
We're still adding new language and context to each principle.
Some Controls need context or recommendations/evidence. We also need to score each recommendation based on how much effort it will take to implement.
We still need to rethink how compliance and reporting fit together; ideally this is combined into a few Governance principles.
Crosswalks to laws (various federal/state laws around AI and bias; EU AI Act, etc) and other standards (NIST AI RMF, SOC2) still need to be built or updated.
We want to include principles around IP risk.