Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding
Kuo Liao, Shuang Li, Meng Zhao, Liqun Liu, Mengge Xue, Zhenyu Hu,, Honglin Han, Chengguo Yin

TL;DR
This paper introduces a novel reinforcement learning framework with label-sensitive rewards to improve natural language understanding in large language models, addressing the limitations of existing reinforcement learning from human feedback methods.
Contribution
The paper proposes RLLR, a new reinforcement learning approach that incorporates label-sensitive pairs to better capture semantic nuances in NLU tasks, improving performance over existing methods.
Findings
RLLR outperforms supervised fine-tuning by 1.54% on average.
RLLR improves over RLHF models by 0.69% on average.
Experiments on five models across eight tasks demonstrate its effectiveness.
Abstract
Recent strides in large language models (LLMs) have yielded remarkable performance, leveraging reinforcement learning from human feedback (RLHF) to significantly enhance generation and alignment capabilities. However, RLHF encounters numerous challenges, including the objective mismatch issue, leading to suboptimal performance in Natural Language Understanding (NLU) tasks. To address this limitation, we propose a novel Reinforcement Learning framework enhanced with Label-sensitive Reward (RLLR) to amplify the performance of LLMs in NLU tasks. By incorporating label-sensitive pairs into reinforcement learning, our method aims to adeptly capture nuanced label-sensitive semantic features during RL, thereby enhancing natural language understanding. Experiments conducted on five diverse foundation models across eight tasks showcase promising results. In comparison to Supervised Fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems · Fuzzy Logic and Control Systems
