One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for   Retrieval-Augmented Large Language Models

Yutao Zhu; Zhaoheng Huang; Zhicheng Dou; Ji-Rong Wen

arXiv:2405.19670·cs.CL·December 12, 2024

One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models

Yutao Zhu, Zhaoheng Huang, Zhicheng Dou, Ji-Rong Wen

PDF

Open Access 2 Repos 1 Models 1 Video

TL;DR

This paper introduces a novel method for enhancing retrieval-augmented large language models by learning scalable, pluggable virtual tokens that improve performance without altering the original model parameters.

Contribution

The authors propose a new approach that fine-tunes only virtual token embeddings, preserving LLM capabilities while boosting retrieval-augmented performance.

Findings

01

Outperforms existing RAG methods on 12 question-answering tasks

02

Maintains original LLM generation quality after virtual token integration

03

Offers scalable and flexible training strategies for virtual tokens

Abstract

Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) for generating more factual, accurate, and up-to-date content. Existing methods either optimize prompts to guide LLMs in leveraging retrieved information or directly fine-tune LLMs to adapt to RAG scenarios. Although fine-tuning can yield better performance, it often compromises the LLMs' general generation capabilities by modifying their parameters. This limitation poses challenges in practical applications, especially when LLMs are already deployed, as parameter adjustments may affect their original functionality. To address this, we propose a novel method that involves learning scalable and pluggable virtual tokens for RAG. By maintaining the LLMs' original parameters and fine-tuning only the embeddings of these pluggable tokens, our approach not only enhances LLMs' performance but also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
yutaozhu94/SPRING
model· ♡ 2
♡ 2

Videos

One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Linear Layer · Byte Pair Encoding · Adam · Residual Connection