Fine-tuning language models to find agreement among humans with diverse   preferences

Michiel A. Bakker; Martin J. Chadwick; Hannah R. Sheahan and; Michael Henry Tessler; Lucy Campbell-Gillingham; Jan Balaguer; Nat; McAleese; Amelia Glaese; John Aslanides; Matthew M. Botvinick and; Christopher Summerfield

arXiv:2211.15006·cs.LG·November 29, 2022·111 cites

Fine-tuning language models to find agreement among humans with diverse preferences

Michiel A. Bakker, Martin J. Chadwick, Hannah R. Sheahan and, Michael Henry Tessler, Lucy Campbell-Gillingham, Jan Balaguer, Nat, McAleese, Amelia Glaese, John Aslanides, Matthew M. Botvinick and, Christopher Summerfield

PDF

Open Access 1 Video

TL;DR

This paper demonstrates how fine-tuned large language models can generate consensus statements that appeal to diverse human preferences, helping groups with conflicting views find common ground.

Contribution

It introduces a method to fine-tune LLMs for group consensus, incorporating individual preferences and social welfare functions, which outperforms existing approaches and human opinions.

Findings

01

Consensus statements are preferred by humans over prompt-based outputs (>70%)

02

The model's consensus surpasses top human opinions (>65%)

03

Excluding group members increases dissent, showing sensitivity of consensus to individual contributions

Abstract

Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Fine-tuning language models to find agreement among humans with diverse preferences· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsALIGN