FoRAG: Factuality-optimized Retrieval Augmented Generation for   Web-enhanced Long-form Question Answering

Tianchi Cai; Zhiwen Tan; Xierui Song; Tao Sun; Jiyan Jiang; Yunqi Xu,; Yinger Zhang; Jinjie Gu

arXiv:2406.13779·cs.CL·July 2, 2024

FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering

Tianchi Cai, Zhiwen Tan, Xierui Song, Tao Sun, Jiyan Jiang, Yunqi Xu,, Yinger Zhang, Jinjie Gu

PDF

TL;DR

FoRAG introduces a novel framework that enhances the factuality and logical clarity of long-form answers in web-enhanced QA by combining outline-based generation and a doubly fine-grained RLHF optimization, outperforming larger models.

Contribution

The paper presents a new factuality-optimized RAG framework with an outline-enhanced generator and a doubly fine-grained RLHF method, improving answer quality in web-based LFQA.

Findings

01

FoRAG outperforms WebGPT-175B on coherence, helpfulness, and factuality.

02

The model achieves comparable performance with significantly fewer parameters.

03

Extensive experiments validate the effectiveness of the proposed approach.

Abstract

Retrieval Augmented Generation (RAG) has become prevalent in question-answering (QA) tasks due to its ability of utilizing search engine to enhance the quality of long-form question-answering (LFQA). Despite the emergence of various open source methods and web-enhanced commercial systems such as Bing Chat, two critical problems remain unsolved, i.e., the lack of factuality and clear logic in the generated long-form answers. In this paper, we remedy these issues via a systematic study on answer generation in web-enhanced LFQA. Specifically, we first propose a novel outline-enhanced generator to achieve clear logic in the generation of multifaceted answers and construct two datasets accordingly. Then we propose a factuality optimization method based on a carefully designed doubly fine-grained RLHF framework, which contains automatic evaluation and reward modeling in different levels of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Residual Connection · Weight Decay · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout · Linear Warmup With Linear Decay