OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs
Jintian Zhang, Cheng Peng, Mengshu Sun, Xiang Chen, Lei Liang,, Zhiqiang Zhang, Jun Zhou, Huajun Chen, Ningyu Zhang

TL;DR
This paper presents OneGen, a unified framework that allows large language models to perform both retrieval and generation tasks simultaneously in a single pass, improving efficiency and effectiveness.
Contribution
Introducing OneGen, the first framework enabling LLMs to conduct vector retrieval during generation in a unified, efficient manner.
Findings
Improved retrieval performance in combined tasks
Maintains generative capabilities while enhancing retrieval
Validates effectiveness on RAG and Entity Linking tasks
Abstract
Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Digital Rights Management and Security
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Softmax · Layer Normalization · Dropout · Attention Is All You Need · WordPiece · Residual Connection · Attention Dropout · Linear Layer
