Reinforcement Learning from Human Feedback: Whose Culture, Whose Values, Whose Perspectives?

Kristian Gonz\'alez Barman; Simon Lohse; and Henk de Regt

arXiv:2407.17482·cs.CY·May 27, 2025

Reinforcement Learning from Human Feedback: Whose Culture, Whose Values, Whose Perspectives?

Kristian Gonz\'alez Barman, Simon Lohse, and Henk de Regt

PDF

TL;DR

This paper advocates for incorporating diverse human perspectives in Reinforcement Learning from Human Feedback to enhance ethical and epistemic responsiveness in Large Language Models, proposing practical steps for improvement.

Contribution

It introduces a pluralist approach to RLHF, emphasizing the importance of diverse cultural and ethical perspectives in shaping LLM development.

Findings

01

Highlights the ethical benefits of pluralism in RLHF

02

Proposes actionable steps for more inclusive LLM training

03

Connects social epistemology with AI feedback mechanisms

Abstract

We argue for the epistemic and ethical advantages of pluralism in Reinforcement Learning from Human Feedback (RLHF) in the context of Large Language Models (LLM). Drawing on social epistemology and pluralist philosophy of science, we suggest ways in which RHLF can be made more responsive to human needs and how we can address challenges along the way. The paper concludes with an agenda for change, i.e. concrete, actionable steps to improve LLM development.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.