Changelog
2025-04-08
Thanks to everyone for their latest round of feedback.
Overall
Controls are more consistent across the board.
Evaluation controls have been renamed from "Test AI for XYZ" to "Evaluate AI for XYZ" for clarity.
We've tried to distinguish clearly across three types of related harm by output type: harmful outputs from benign inputs (harmful outputs); harmful outputs from adversarial inputs (adversarial inputs); and system-compromising outputs from adversarial attacks (adversarial attacks).
Safety
All principles now have revised controls.
Security
All principles now have revised controls.
"External systems" has been generally renamed to "tool calls" to align with industry terminology.
The principle around securing AI infrastructure has been dramatically simplified.
Privacy
All principles now have revised controls.
Governance
All principles now have revised controls.
The principle around assessing AI vendors for compliance has been dramatically simplified and now only includes the most crucial elements.
The principle around aligning with bias and discrimination laws has been simplified to remove industry-specific controls, and is instead centered around tracking compliance status and mapping them implementations.
The principle around designated geographies has been rewritten for clarity and renamed to "approved regions".
Efficacy
All principles now have revised controls.
AI incident response has been updated to include customer disclosure for serious incidents.
Society
All principles now have revised controls.
Two principles around serious model misalignment and technical reuse have been restructured into three new principles preventing deception, cyber exploitation, and catastrophic harm (CBRN).
Upcoming changes
Most controls need recommendations and evidence criteria. We also need to score recommendations based on effort.
Crosswalks to laws (various federal/state laws around AI and bias; EU AI Act, etc) and other standards (NIST AI RMF, SOC2) still need to be built or updated.
We want to include principles around IP risk.
2025-04-07
Thanks to everyone for their latest round of feedback.
In this version, all principles in the Safety, Security, Efficacy, and Society categories have received clearer names and new controls that are clearer and more relevant.
We're now down to no more than 5 principles per category and no more than 5 controls per principle.
Safety
"Mitigate against misuse." now includes evaluation controls.
"Mitigate harmful outputs." now includes evaluation controls, a new control around a harm severity taxonomy, and a new control around human-in-the-loop reviews.
Security
"Limit and monitor AI access to external systems." is now centered around limiting tool use by risk level and necessity.
"Protect access to AI systems, data, and model assets." is focused on access control for critical AI systems. Controls that overlapped with SOC II have been removed.
"Defend against adversarial inputs" now includes evaluation controls.
Privacy
Privacy principles and controls will be updated soon.
Governance
Governance principles and controls will be updated soon.
Efficacy
A new principle, "Keep your AI system within its intended scope.", has been added.
The principle around AI-related incidents was replaced with a new principle around notifying customers for serious AI incidents.
Society
"Safeguard your product against serious model misalignment." now includes evaluation controls.
"Prevent your AI systems from technical misuse against others." now includes evaluation controls.
Upcoming changes
Some Controls need context or recommendations/evidence. We also need to score each recommendation based on how much effort it will take to implement.
We still need to rethink how compliance and reporting fit together; ideally this is combined into a few Governance principles.
Principles around Bias are scattered and unorganized.
Crosswalks to laws (various federal/state laws around AI and bias; EU AI Act, etc) and other standards (NIST AI RMF, SOC2) still need to be built or updated.
We want to include principles around IP risk.
2025-04-02
Thanks to everyone for their latest round of feedback.
In this version, we've tightened the language across the board:
Promises are now Principles. They've been rewritten to the second-person perspective ("Secure your systems" instead of "We secure our systems").
Tasks are now Recommendations, to clarify that they are practices that we strongly recommend (but do not require; there are other ways to comply with Controls.)
Categories have shifted:
2 new categories, Efficacy and Society, has been introduced.
Bias has been removed; its principles have been folded into Governance and Efficacy.
Questions
Should we call the category Efficacy or Robustness?
Upcoming changes
Work continues on clarity; our goal is no more than 5 controls per principle and no more than 5 principles per category.
We're still adding new language and context to each principle.
Some Controls need context or recommendations/evidence. We also need to score each recommendation based on how much effort it will take to implement.
We still need to rethink how compliance and reporting fit together; ideally this is combined into a few Governance principles.
Crosswalks to laws (various federal/state laws around AI and bias; EU AI Act, etc) and other standards (NIST AI RMF, SOC2) still need to be built or updated.
We want to include principles around IP risk.