OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Jintian Zhang; Cheng Peng; Mengshu Sun; Xiang Chen; Lei Liang,; Zhiqiang Zhang; Jun Zhou; Huajun Chen; Ningyu Zhang

arXiv:2409.05152·cs.CL·October 3, 2024

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Jintian Zhang, Cheng Peng, Mengshu Sun, Xiang Chen, Lei Liang,, Zhiqiang Zhang, Jun Zhou, Huajun Chen, Ningyu Zhang

PDF

Open Access 1 Repo

TL;DR

This paper presents OneGen, a unified framework that allows large language models to perform both retrieval and generation tasks simultaneously in a single pass, improving efficiency and effectiveness.

Contribution

Introducing OneGen, the first framework enabling LLMs to conduct vector retrieval during generation in a unified, efficient manner.

Findings

01

Improved retrieval performance in combined tasks

02

Maintains generative capabilities while enhancing retrieval

03

Validates effectiveness on RAG and Entity Linking tasks

Abstract

Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjunlp/onegen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Digital Rights Management and Security

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Softmax · Layer Normalization · Dropout · Attention Is All You Need · WordPiece · Residual Connection · Attention Dropout · Linear Layer