Hierarchical corpus encoder: Fusing generative retrieval and dense indices
Tongfei Chen, Ankita Sharma, Adam Pauls, Benjamin Van Durme

TL;DR
The paper introduces the hierarchical corpus encoder (HCE), a novel method that combines generative retrieval with dense indices, improving zero-shot and supervised document retrieval performance while enabling flexible document management.
Contribution
The paper proposes HCE, a hierarchical encoder that fuses generative retrieval with dense indexing, addressing unseen documents and dynamic index updates.
Findings
HCE outperforms existing generative retrieval models in zero-shot and supervised settings.
HCE allows easy addition and removal of documents from the index.
HCE leverages contrastive training between sibling nodes in a hierarchy.
Abstract
Generative retrieval employs sequence models for conditional generation of document IDs based on a query (DSI (Tay et al., 2022); NCI (Wang et al., 2022); inter alia). While this has led to improved performance in zero-shot retrieval, it is a challenge to support documents not seen during training. We identify the performance of generative retrieval lies in contrastive training between sibling nodes in a document hierarchy. This motivates our proposal, the hierarchical corpus encoder (HCE), which can be supported by traditional dense encoders. Our experiments show that HCE achieves superior results than generative retrieval models under both unsupervised zero-shot and supervised settings, while also allowing the easy addition and removal of documents to the index.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
