Learning When to Retrieve, What to Rewrite, and How to Respond in   Conversational QA

Nirmal Roy; Leonardo F. R. Ribeiro; Rexhina Blloshmi; Kevin Small

arXiv:2409.15515·cs.CL·September 25, 2024

Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA

Nirmal Roy, Leonardo F. R. Ribeiro, Rexhina Blloshmi, Kevin Small

PDF

Open Access 1 Video

TL;DR

This paper introduces SELF-multi-RAG, a method enabling conversational LLMs to decide when to retrieve, how to rewrite context, and assess passage relevance, significantly improving response quality in conversational QA tasks.

Contribution

It extends the SELF-RAG framework to conversational settings, allowing LLMs to better manage retrieval and response generation across multiple turns.

Findings

01

SELF-multi-RAG outperforms single-turn variants in retrieving relevant passages.

02

The method improves response relevance and quality by ~13% in human evaluations.

03

Experiments on three datasets validate the effectiveness of the approach.

Abstract

Augmenting Large Language Models (LLMs) with information retrieval capabilities (i.e., Retrieval-Augmented Generation (RAG)) has proven beneficial for knowledge-intensive tasks. However, understanding users' contextual search intent when generating responses is an understudied topic for conversational question answering (QA). This conversational extension leads to additional concerns when compared to single-turn QA as it is more challenging for systems to comprehend conversational context and manage retrieved passages over multiple turns. In this work, we propose a method for enabling LLMs to decide when to retrieve in RAG settings given a conversational context. When retrieval is deemed necessary, the LLM then rewrites the conversation for passage retrieval and judges the relevance of returned passages before response generation. Operationally, we build on the single-turn SELF-RAG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA· underline

Taxonomy

TopicsEducational Assessment and Pedagogy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Dense Connections · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay · Adam · WordPiece