FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering
Tianchi Cai, Zhiwen Tan, Xierui Song, Tao Sun, Jiyan Jiang, Yunqi Xu,, Yinger Zhang, Jinjie Gu

TL;DR
FoRAG introduces a novel framework that enhances the factuality and logical clarity of long-form answers in web-enhanced QA by combining outline-based generation and a doubly fine-grained RLHF optimization, outperforming larger models.
Contribution
The paper presents a new factuality-optimized RAG framework with an outline-enhanced generator and a doubly fine-grained RLHF method, improving answer quality in web-based LFQA.
Findings
FoRAG outperforms WebGPT-175B on coherence, helpfulness, and factuality.
The model achieves comparable performance with significantly fewer parameters.
Extensive experiments validate the effectiveness of the proposed approach.
Abstract
Retrieval Augmented Generation (RAG) has become prevalent in question-answering (QA) tasks due to its ability of utilizing search engine to enhance the quality of long-form question-answering (LFQA). Despite the emergence of various open source methods and web-enhanced commercial systems such as Bing Chat, two critical problems remain unsolved, i.e., the lack of factuality and clear logic in the generated long-form answers. In this paper, we remedy these issues via a systematic study on answer generation in web-enhanced LFQA. Specifically, we first propose a novel outline-enhanced generator to achieve clear logic in the generation of multifaceted answers and construct two datasets accordingly. Then we propose a factuality optimization method based on a carefully designed doubly fine-grained RLHF framework, which contains automatic evaluation and reward modeling in different levels of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Residual Connection · Weight Decay · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout · Linear Warmup With Linear Decay
