Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval
Ravisri Valluri, Akash Kumar Mohankumar, Kushal Dave, Amit Singh, Jian, Jiao, Manik Varma, Gaurav Sinha

TL;DR
This paper introduces PIXAR, a non-autoregressive model with an expanded vocabulary for efficient generative retrieval, significantly improving retrieval performance while maintaining low latency and cost.
Contribution
PIXAR expands NAR model vocabularies to include multi-word entities, reducing token dependencies and enhancing retrieval accuracy without increasing inference latency.
Findings
31.0% improvement in MRR@10 on MS MARCO
23.2% increase in Hits@5 on Natural Questions
5.08% increase in ad clicks in online experiments
Abstract
Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates fully Non-autoregressive (NAR) language models as a more efficient alternative for generative retrieval. While standard NAR models alleviate latency and cost concerns, they exhibit a significant drop in retrieval performance (compared to AR models) due to their inability to capture dependencies between target tokens. To address this, we question the conventional choice of limiting the target token space to solely words or sub-words. We propose PIXAR, a novel approach that expands the target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
