Collective Constitutional AI: Aligning a Language Model with Public   Input

Saffron Huang; Divya Siddarth; Liane Lovitt; Thomas I. Liao; Esin; Durmus; Alex Tamkin; Deep Ganguli

arXiv:2406.07814·cs.AI·June 13, 2024

Collective Constitutional AI: Aligning a Language Model with Public Input

Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin, Durmus, Alex Tamkin, Deep Ganguli

PDF

1 Repo

TL;DR

This paper introduces Collective Constitutional AI, a method for incorporating public input into language model training, resulting in models with reduced bias and more balanced responses on contentious issues.

Contribution

It presents the first LM fine-tuned with collectively sourced public input, demonstrating improved bias reduction and response quality compared to traditional developer-guided models.

Findings

01

Lower bias across nine social dimensions

02

Maintains performance on language, math, and helpfulness

03

Models differ in handling contentious topics, favoring positive reframing

Abstract

There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

saffronh/ccai
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.