Mapping Social Choice Theory to RLHF

Jessica Dai; Eve Fleisig

arXiv:2404.13038·cs.AI·April 22, 2024

Mapping Social Choice Theory to RLHF

Jessica Dai, Eve Fleisig

PDF

Open Access

TL;DR

This paper explores how social choice theory can inform reinforcement learning from human feedback (RLHF), analyzing their similarities and differences to improve preference aggregation in AI systems.

Contribution

It provides a comparative analysis of social choice theory and RLHF, highlighting key differences and implications for preference aggregation in AI.

Findings

01

Identifies differences between social choice and RLHF settings.

02

Discusses how social choice results relate to RLHF.

03

Provides insights into preference aggregation challenges.

Abstract

Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory's analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, identify key differences between them, and discuss how these differences may affect the RLHF interpretation of well-known technical results in social choice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGame Theory and Voting Systems · Mobile Crowdsensing and Crowdsourcing · Reinforcement Learning in Robotics