Parametric Retrieval Augmented Generation

Weihang Su; Yichen Tang; Qingyao Ai; Junxi Yan; Changyue Wang,; Hongning Wang; Ziyi Ye; Yujia Zhou; Yiqun Liu

arXiv:2501.15915·cs.CL·January 28, 2025

Parametric Retrieval Augmented Generation

Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang,, Hongning Wang, Ziyi Ye, Yujia Zhou, Yiqun Liu

PDF

Open Access 1 Repo

TL;DR

Parametric RAG introduces a novel method to embed external knowledge directly into LLM parameters, improving efficiency and effectiveness over traditional in-context retrieval methods, and can be combined for even better performance.

Contribution

The paper proposes Parametric RAG, a new paradigm that integrates external knowledge into LLMs' parameters, overcoming limitations of input-level knowledge injection.

Findings

01

Significantly improves knowledge augmentation effectiveness.

02

Reduces computational costs by eliminating the need for document retrieval during inference.

03

Can be combined with in-context RAG for enhanced performance.

Abstract

Retrieval-augmented generation (RAG) techniques have emerged as a promising solution to enhance the reliability of large language models (LLMs) by addressing issues like hallucinations, outdated knowledge, and domain adaptation. In particular, existing RAG methods append relevant documents retrieved from external corpus or databases to the input of LLMs to guide their generation process, which we refer to as the in-context knowledge injection method. While this approach is simple and often effective, it has inherent limitations. Firstly, increasing the context length and number of relevant documents can lead to higher computational overhead and degraded performance, especially in complex reasoning tasks. More importantly, in-context knowledge injection operates primarily at the input level, but LLMs store their internal knowledge in their parameters. This gap fundamentally limits the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oneal2000/prag
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Softmax · Linear Warmup With Linear Decay · Adam · Residual Connection · Dropout · Byte Pair Encoding