Beyond Preferences: Learning Alignment Principles Grounded in Human Reasons and Values
Henry Bell, Lara Neubauer da Costa Schertel, Bochu Ding, Brandon Fain

TL;DR
This paper introduces Grounded Constitutional AI (GCAI), a framework that creates AI alignment principles based on human reasons and values, improving fairness, moral grounding, and human preference alignment.
Contribution
It extends the ICAI approach by incorporating human-provided reasons and values to generate more representative and morally grounded AI constitutions.
Findings
GCAI-generated constitutions are preferred by humans over ICAI ones.
Participants find GCAI constitutions more morally grounded and coherent.
GCAI effectively combines general principles and contextual preferences.
Abstract
A crucial consideration when developing and deploying Large Language Models (LLMs) is the human values to which these models are aligned. In the constitutional framework of alignment models are aligned to a set of principles (the constitution) specified in natural language. However, it is unclear how to fairly determine this constitution with widespread stakeholder input. In this work we propose Grounded Constitutional AI (GCAI), a unified framework for generating constitutions of principles that are representative of both users' general expectations toward AI (general principles) and their interaction-time preferences (contextual principles). We extend the Inverse Constitutional AI (ICAI) approach to generate contextual principles from human preference annotation data by leveraging human-provided \textit{reasons} for their preferences. We supplement these contextual principles with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
