Mapping Social Choice Theory to RLHF
Jessica Dai, Eve Fleisig

TL;DR
This paper explores how social choice theory can inform reinforcement learning from human feedback (RLHF), analyzing their similarities and differences to improve preference aggregation in AI systems.
Contribution
It provides a comparative analysis of social choice theory and RLHF, highlighting key differences and implications for preference aggregation in AI.
Findings
Identifies differences between social choice and RLHF settings.
Discusses how social choice results relate to RLHF.
Provides insights into preference aggregation challenges.
Abstract
Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory's analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, identify key differences between them, and discuss how these differences may affect the RLHF interpretation of well-known technical results in social choice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Voting Systems · Mobile Crowdsensing and Crowdsourcing · Reinforcement Learning in Robotics
