RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering
Yiming Zhang, Siyue Zhang, Junbo Zhao, Chen Zhao

TL;DR
RPDR is a data augmentation framework that improves long-tail question answering by selecting easy-to-learn training data to enhance dense retrievers, leading to significant performance gains on challenging benchmarks.
Contribution
Introduces RPDR, a novel data augmentation method using round-trip prediction for better training data selection in long-tail QA retrieval tasks.
Findings
RPDR outperforms existing retrievers like BM25 and Contriver on long-tail benchmarks.
RPDR significantly improves retrieval accuracy on extremely long-tail categories.
Human analysis reveals strengths and limitations of the proposed approach.
Abstract
Long-tail question answering presents significant challenges for large language models (LLMs) due to their limited ability to acquire and accurately recall less common knowledge. Retrieval-augmented generation (RAG) systems have shown great promise in mitigating this limitation by integrating external retrieval mechanisms. However, dense retrieval models often face the same difficulties when generalizing to rare or niche knowledge. In this study, we introduce RPDR, a novel data augmentation framework that selects high-quality easy-to-learn training data, to enhance dense retrievers. Our approach is built around three core components: synthetic data generation, data selection with Round-Trip prediction to identify easy-to-learn instances, and retriever training with these instances. We evaluate RPDR on two long-tail retrieval benchmarks, PopQA and EntityQuestion, demonstrating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Expert finding and Q&A systems
