CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Yiruo Cheng, Kelong Mao, Ziliang Zhao, Guanting Dong, Hongjin Qian,, Yongkang Wu, Tetsuya Sakai, Ji-Rong Wen, Zhicheng Dou

TL;DR
CORAL is a large-scale benchmark designed to evaluate multi-turn conversational retrieval-augmented generation systems, addressing a significant gap in existing research focused mainly on single-turn scenarios.
Contribution
The paper introduces CORAL, a comprehensive benchmark for multi-turn conversational RAG, and provides a unified framework for evaluation and comparison of different methods.
Findings
Existing methods show room for improvement on CORAL
CORAL reveals challenges in open-domain coverage and topic shifts
Evaluation highlights the need for more robust conversational RAG models
Abstract
Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models (LLMs) through external knowledge retrieval. Despite its widespread attention, existing academic research predominantly focuses on single-turn RAG, leaving a significant gap in addressing the complexities of multi-turn conversations found in real-world applications. To bridge this gap, we introduce CORAL, a large-scale benchmark designed to assess RAG systems in realistic multi-turn conversational settings. CORAL includes diverse information-seeking conversations automatically derived from Wikipedia and tackles key challenges such as open-domain coverage, knowledge intensity, free-form responses, and topic shifts. It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling. We propose a unified framework to standardize various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Correlation Alignment for Deep Domain Adaptation · Linear Layer · Dropout · Dense Connections · Layer Normalization · Residual Connection · Weight Decay · Byte Pair Encoding · Linear Warmup With Linear Decay
