Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach
Shannon Lodoen, Alexi Orchard

TL;DR
This paper applies a procedural rhetorical analysis to RLHF-enhanced AI chatbots, revealing ethical and social implications of the underlying mechanisms that influence language use, trust, and human-AI interactions.
Contribution
It introduces a novel procedural rhetorical approach to analyze the mechanisms of RLHF in AI chatbots, shifting focus from content to underlying persuasive procedures.
Findings
Highlights ethical concerns related to bias and transparency
Identifies how RLHF procedures influence social and linguistic norms
Suggests new directions for AI ethics research
Abstract
Since 2022, versions of generative AI chatbots such as ChatGPT and Claude have been trained using a specialized technique called Reinforcement Learning from Human Feedback (RLHF) to fine-tune language model output using feedback from human annotators. As a result, the integration of RLHF has greatly enhanced the outputs of these large language models (LLMs) and made the interactions and responses appear more "human-like" than those of previous versions using only supervised learning. The increasing convergence of human and machine-written text has potentially severe ethical, sociotechnical, and pedagogical implications relating to transparency, trust, bias, and interpersonal relations. To highlight these implications, this paper presents a rhetorical analysis of some of the central procedures and processes currently being reshaped by RLHF-enhanced generative AI chatbots: upholding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbodied and Extended Cognition
