TL;DR
This paper introduces Collective Constitutional AI, a method for incorporating public input into language model training, resulting in models with reduced bias and more balanced responses on contentious issues.
Contribution
It presents the first LM fine-tuned with collectively sourced public input, demonstrating improved bias reduction and response quality compared to traditional developer-guided models.
Findings
Lower bias across nine social dimensions
Maintains performance on language, math, and helpfulness
Models differ in handling contentious topics, favoring positive reframing
Abstract
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
