ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization

YuXuan Zhang

arXiv:2507.03069·cs.CL·October 28, 2025

ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization

YuXuan Zhang

PDF

Open Access

TL;DR

This paper introduces ARF-RLHF, a novel method that leverages natural language feedback to create continuous preference signals, enhancing the alignment of large language models more effectively than traditional binary-label approaches.

Contribution

The paper proposes ARF, a new approach that converts free-form feedback into continuous preference trajectories and optimizes them with TraceBias, improving RLHF performance.

Findings

01

ARF outperforms PPO and DPO in diverse settings.

02

Improves alignment by up to 7.6%.

03

Provides a scalable, personalized RLHF framework.

Abstract

Current RLHF methods such as PPO and DPO typically reduce human preferences to binary labels, which are costly to obtain and too coarse to reflect individual variation. We observe that expressions of satisfaction and dissatisfaction follow stable linguistic patterns across users, indicating that more informative supervisory signals can be extracted from free-form feedback. Building on this insight, we introduce Adaptive Reward-Following (ARF), which converts natural feedback into continuous preference trajectories and optimizes them using the novel TraceBias algorithm. Across diverse LLMs and preference domains, ARF consistently outperforms PPO and DPO, improving alignment by up to 7.6%. Our results demonstrate that continuous reward modeling provides a scalable path toward personalized and theoretically grounded RLHF.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics