GRC: Unifying Reasoning-Driven Generation, Retrieval and Compression
Zhongtao Miao, Qiyu Wu, Yoshimasa Tsuruoka

TL;DR
GRC introduces a unified training framework for reasoning-driven generation, text representation, and context compression in LLMs, reducing training and deployment costs while enabling flexible, efficient inference.
Contribution
The paper proposes a novel meta latent token approach and a unified generative and compressive training method to simultaneously accomplish multiple reasoning and compression tasks in one pass.
Findings
Achieves three tasks in a single forward pass with modular inference.
Reduces deployment effort for retrieval-augmented generation.
Demonstrates effectiveness on reasoning, generation, and compression benchmarks.
Abstract
Text embedding and generative tasks are usually trained separately based on large language models (LLMs) nowadays. This causes a large amount of training cost and deployment effort. Context compression is also a challenging and pressing task, which is vital to reasoning-driven generation, and agentic tasks requiring long context and continual learning. In this paper, we explore how to unify reasoning-driven generation, reasoning-enhanced text representation and context compression tasks in one forward pass for LLMs. Through meta latent tokens and a unified generative, representative and compressive tuning approach, we propose a training framework named GRC that bridges the three tasks. The trained models can accomplish three objectives in a single forward pass while maintaining modular, LEGO-style flexibility during inference. This design greatly reduces the deployment effort for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
