Toward General Instruction-Following Alignment for Retrieval-Augmented   Generation

Guanting Dong; Xiaoshuai Song; Yutao Zhu; Runqi Qiao; Zhicheng Dou,; Ji-Rong Wen

arXiv:2410.09584·cs.CL·October 15, 2024·3 cites

Toward General Instruction-Following Alignment for Retrieval-Augmented Generation

Guanting Dong, Xiaoshuai Song, Yutao Zhu, Runqi Qiao, Zhicheng Dou,, Ji-Rong Wen

PDF

Open Access 1 Repo 4 Datasets

TL;DR

This paper introduces VIF-RAG, a scalable pipeline for instruction-following alignment in RAG systems, and the FollowRAG Benchmark, to evaluate and improve LLM performance in instruction adherence within retrieval-augmented tasks.

Contribution

It presents the first automated synthetic pipeline for instruction alignment in RAG and introduces a comprehensive benchmark for evaluation.

Findings

01

VIF-RAG improves LLM performance on instruction constraints

02

FollowRAG Benchmark covers 22 instruction categories

03

Automated pipeline scales to over 100k data samples

Abstract

Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. Despite recent advancements in Large Language Models (LLMs), research on assessing and improving instruction-following (IF) alignment within the RAG domain remains limited. To address this issue, we propose VIF-RAG, the first automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems. We start by manually crafting a minimal set of atomic instructions (<100) and developing combination rules to synthesize and verify complex instructions for a seed set. We then use supervised models for instruction rewriting while simultaneously generating code to automate the verification of instruction quality via a Python executor. Finally, we integrate these instructions with extensive RAG and general data samples, scaling up to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dongguanting/FollowRAG
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Dropout · Layer Normalization