GLEN: Generative Retrieval via Lexical Index Learning

Sunkyung Lee; Minjin Choi; Jongwuk Lee

arXiv:2311.03057·cs.IR·June 3, 2025·1 cites

GLEN: Generative Retrieval via Lexical Index Learning

Sunkyung Lee, Minjin Choi, Jongwuk Lee

PDF

Open Access 1 Repo 1 Models

TL;DR

GLEN introduces a novel generative retrieval method that learns lexical identifiers through a two-phase index learning strategy, achieving state-of-the-art performance on multiple benchmarks by directly generating document identifiers for queries.

Contribution

It proposes a new generative retrieval approach with a dynamic lexical index learning strategy and collision-free inference, addressing key challenges in existing methods.

Findings

01

Achieves state-of-the-art performance on NQ320k, MS MARCO, and BEIR datasets.

02

Effectively learns meaningful lexical identifiers and relevance signals.

03

Utilizes collision-free inference for efficient document ranking.

Abstract

Generative retrieval shed light on a new paradigm of document retrieval, aiming to directly generate the identifier of a relevant document for a query. While it takes advantage of bypassing the construction of auxiliary index structures, existing studies face two significant challenges: (i) the discrepancy between the knowledge of pre-trained language models and identifiers and (ii) the gap between training and inference that poses difficulty in learning to rank. To overcome these challenges, we propose a novel generative retrieval method, namely Generative retrieval via LExical iNdex learning (GLEN). For training, GLEN effectively exploits a dynamic lexical identifier using a two-phase index learning strategy, enabling it to learn meaningful lexical identifiers and relevance signals between queries and documents. For inference, GLEN utilizes collision-free inference, using identifier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

skleee/GLEN
pytorchOfficial

Models

🤗
QuanTH02/GLEN-model
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies