In-Context Learning with Reinforcement Learning for Incomplete Utterance   Rewriting

Haowei Du; Dongyan Zhao

arXiv:2408.13028·cs.CL·August 26, 2024

In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting

Haowei Du, Dongyan Zhao

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning-based framework for selecting examples in in-context learning, improving the performance of large language models on incomplete utterance rewriting tasks by directly utilizing feedback from the models.

Contribution

It proposes a novel policy-based reinforcement learning method for example selection that outperforms existing retrieval-based methods and supervised fine-tuning in few-shot scenarios.

Findings

01

Significantly outperforms existing example selection methods.

02

Advantages over supervised fine-tuning models in few-shot settings.

03

Balance of example abundance and similarity improves ICL performance.

Abstract

In-context learning (ICL) of large language models (LLMs) has attracted increasing attention in the community where LLMs make predictions only based on instructions augmented with a few examples. Existing example selection methods for ICL utilize sparse or dense retrievers and derive effective performance. However, these methods do not utilize direct feedback of LLM to train the retriever and the examples selected can not necessarily improve the analogy ability of LLM. To tackle this, we propose our policy-based reinforcement learning framework for example selection (RLS), which consists of a language model (LM) selector and an LLM generator. The LM selector encodes the candidate examples into dense representations and selects the top-k examples into the demonstration for LLM. The outputs of LLM are adopted to compute the reward and policy gradient to optimize the LM selector. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Speech Recognition and Synthesis

MethodsSoftmax · Attention Is All You Need