LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for   Long-Context Question Answering

Qingfei Zhao; Ruobing Wang; Yukuo Cen; Daren Zha; Shicheng Tan; Yuxiao; Dong; Jie Tang

arXiv:2410.18050·cs.CL·November 4, 2024·2 cites

LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Yuxiao, Dong, Jie Tang

PDF

Open Access 1 Repo 1 Video

TL;DR

LongRAG introduces a dual-perspective retrieval-augmented generation system that significantly improves long-context question answering by better integrating global information and factual details, outperforming existing models.

Contribution

The paper presents LongRAG, a novel dual-perspective RAG paradigm that enhances long-context understanding and is adaptable across domains and large language models.

Findings

01

LongRAG outperforms long-context LLMs by 6.94%.

02

LongRAG surpasses advanced RAG by 6.16%.

03

LongRAG exceeds Vanilla RAG by 17.25%.

Abstract

Long-Context Question Answering (LCQA), a challenging task, aims to reason over long-context documents to yield accurate answers to questions. Existing long-context Large Language Models (LLMs) for LCQA often struggle with the "lost in the middle" issue. Retrieval-Augmented Generation (RAG) mitigates this issue by providing external factual evidence. However, its chunking strategy disrupts the global long-context information, and its low-quality retrieval in long contexts hinders LLMs from identifying effective factual details due to substantial noise. To this end, we propose LongRAG, a general, dual-perspective, and robust LLM-based RAG system paradigm for LCQA to enhance RAG's understanding of complex long-context knowledge (i.e., global information and factual details). We design LongRAG as a plug-and-play paradigm, facilitating adaptation to various domains and LLMs. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qingfei1/longrag
noneOfficial

Videos

LongRAG: A Dual-perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Linear Layer · Dropout · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay · Attention Is All You Need · Dense Connections