Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural   Language Understanding

Kuo Liao; Shuang Li; Meng Zhao; Liqun Liu; Mengge Xue; Zhenyu Hu,; Honglin Han; Chengguo Yin

arXiv:2405.19763·cs.CL·May 31, 2024

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

Kuo Liao, Shuang Li, Meng Zhao, Liqun Liu, Mengge Xue, Zhenyu Hu,, Honglin Han, Chengguo Yin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel reinforcement learning framework with label-sensitive rewards to improve natural language understanding in large language models, addressing the limitations of existing reinforcement learning from human feedback methods.

Contribution

The paper proposes RLLR, a new reinforcement learning approach that incorporates label-sensitive pairs to better capture semantic nuances in NLU tasks, improving performance over existing methods.

Findings

01

RLLR outperforms supervised fine-tuning by 1.54% on average.

02

RLLR improves over RLHF models by 0.69% on average.

03

Experiments on five models across eight tasks demonstrate its effectiveness.

Abstract

Recent strides in large language models (LLMs) have yielded remarkable performance, leveraging reinforcement learning from human feedback (RLHF) to significantly enhance generation and alignment capabilities. However, RLHF encounters numerous challenges, including the objective mismatch issue, leading to suboptimal performance in Natural Language Understanding (NLU) tasks. To address this limitation, we propose a novel Reinforcement Learning framework enhanced with Label-sensitive Reward (RLLR) to amplify the performance of LLMs in NLU tasks. By incorporating label-sensitive pairs into reinforcement learning, our method aims to adeptly capture nuanced label-sensitive semantic features during RL, thereby enhancing natural language understanding. Experiments conducted on five diverse foundation models across eight tasks showcase promising results. In comparison to Supervised Fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

magiasn/acl2024_rllr
noneOfficial

Videos

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding· underline

Taxonomy

TopicsSpeech and dialogue systems · Fuzzy Logic and Control Systems