TL;DR
DiffRetriever introduces a parallel token retrieval method for diffusion language models, significantly enhancing retrieval performance and efficiency over previous autoregressive approaches.
Contribution
It presents a novel parallel token retrieval technique for diffusion models, overcoming sequential bottlenecks and improving retrieval accuracy and speed.
Findings
DiffRetriever outperforms single-token methods across multiple diffusion backbones.
Multi-token DiffRetriever improves retrieval over autoregressive multi-token methods.
Code is available at https://github.com/ielab/diffretriever.
Abstract
PromptReps showed that an autoregressive language model can be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and prior multi-token variants did not reliably improve over single-token decoding. We show that the bottleneck is sequential generation, not the multi-token idea itself. DiffRetriever is a representative-token retriever for diffusion language models: it appends K masked positions to the prompt and reads all K in a single bidirectional forward pass. Across in-domain and out-of-domain evaluation, multi-token DiffRetriever substantially improves over single-token on every diffusion backbone we test, while autoregressive multi-token is flat or negative and pays a latency cost that scales…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
