C3AI: Crafting and Evaluating Constitutions for Constitutional AI

Yara Kyrychenko; Ke Zhou; Edyta Bogucka; and Daniele Quercia

arXiv:2502.15861·cs.AI·February 25, 2025

C3AI: Crafting and Evaluating Constitutions for Constitutional AI

Yara Kyrychenko, Ke Zhou, Edyta Bogucka, and Daniele Quercia

PDF

TL;DR

The paper introduces C3AI, a framework for selecting, structuring, and evaluating principles in Constitutional AI to improve model alignment with human preferences and safety standards.

Contribution

C3AI offers a systematic approach to designing and assessing constitutions for CAI, including a graph-based method for principle selection and analysis of framing effects.

Findings

01

Positively framed, behavior-based principles align better with human preferences.

02

Refined constitutions improve safety while maintaining reasoning capabilities.

03

Models perform differently on negatively versus positively framed principles.

Abstract

Constitutional AI (CAI) guides LLM behavior using constitutions, but identifying which principles are most effective for model alignment remains an open challenge. We introduce the C3AI framework (\textit{Crafting Constitutions for CAI models}), which serves two key functions: (1) selecting and structuring principles to form effective constitutions before fine-tuning; and (2) evaluating whether fine-tuned CAI models follow these principles in practice. By analyzing principles from AI and psychology, we found that positively framed, behavior-based principles align more closely with human preferences than negatively framed or trait-based principles. In a safety alignment use case, we applied a graph-based principle selection method to refine an existing CAI constitution, improving safety measures while maintaining strong general reasoning capabilities. Interestingly, fine-tuned CAI models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN