Dense Hierarchical Retrieval for Open-Domain Question Answering
Ye Liu, Kazuma Hashimoto, Yingbo Zhou, Semih Yavuz, Caiming Xiong,, Philip S. Yu

TL;DR
This paper introduces Dense Hierarchical Retrieval (DHR), a hierarchical approach that improves passage retrieval accuracy for open-domain QA by leveraging document-level and passage-level semantics, outperforming existing dense retrievers.
Contribution
The paper proposes a novel hierarchical retrieval framework that combines document-level and passage-level semantics, enhancing retrieval accuracy and robustness in open-domain QA systems.
Findings
DHR significantly outperforms original dense passage retrievers.
DHR improves end-to-end QA system performance on multiple benchmarks.
Hierarchical title structure and negative sampling strategies enhance retrieval quality.
Abstract
Dense neural text retrieval has achieved promising results on open-domain Question Answering (QA), where latent representations of questions and passages are exploited for maximum inner product search in the retrieval process. However, current dense retrievers require splitting documents into short passages that usually contain local, partial, and sometimes biased context, and highly depend on the splitting process. As a consequence, it may yield inaccurate and misleading hidden representations, thus deteriorating the final retrieval result. In this work, we propose Dense Hierarchical Retrieval (DHR), a hierarchical framework that can generate accurate dense representations of passages by utilizing both macroscopic semantics in the document and microscopic semantics specific to each passage. Specifically, a document-level retriever first identifies relevant documents, among which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
