Auto Search Indexer for End-to-End Document Retrieval
Tianchi Yang, Minghui Song, Zihan Zhang, Haizhen Huang, Weiwei Deng,, Feng Sun, Qi Zhang

TL;DR
This paper introduces Auto Search Indexer (ASI), an end-to-end generative retrieval model that automatically learns document identifiers and retrieves documents effectively, including new ones, outperforming existing methods.
Contribution
The paper presents a novel fully end-to-end retrieval framework combining semantic indexing and generative retrieval, addressing limitations of previous generative models relying on preprocessed docids.
Findings
Outperforms advanced baselines on public datasets.
Effectively retrieves new documents.
Demonstrates superior performance in industrial applications.
Abstract
Generative retrieval, which is a new advanced paradigm for document retrieval, has recently attracted research interests, since it encodes all documents into the model and directly generates the retrieved documents. However, its power is still underutilized since it heavily relies on the "preprocessed" document identifiers (docids), thus limiting its retrieval performance and ability to retrieve new documents. In this paper, we propose a novel fully end-to-end retrieval paradigm. It can not only end-to-end learn the best docids for existing and new documents automatically via a semantic indexing module, but also perform end-to-end document retrieval via an encoder-decoder-based generative model, namely Auto Search Indexer (ASI). Besides, we design a reparameterization mechanism to combine the above two modules into a joint optimization framework. Extensive experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Text and Document Classification Technologies
