Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters
Zhan Su, Fengran Mo, Jinghan Zhang, Yuchen Hui, Jiaao Sun, Jian-yun Nie

TL;DR
This paper introduces Poly-PRAG, a novel retrieval-augmented generation method that uses a small set of shared LoRA adapters with a latent routing function, improving efficiency and reducing storage costs in knowledge integration.
Contribution
Poly-PRAG proposes a shared adapter and routing mechanism, enabling scalable and efficient knowledge encoding in retrieval-augmented generation models.
Findings
Outperforms existing PRAG baselines on four benchmarks.
Reduces storage requirements by sharing adapters across documents.
Enhances encoding efficiency with a latent routing function.
Abstract
Parametric Retrieval-Augmented Generation (PRAG) is a RAG approach that integrates external knowledge directly into model parameters using a LoRA adapter, aiming at reducing the inference cost compared to traditional RAG. However, current PRAG approaches adopt a \textit{one-to-one} document encoding scheme, using a dedicated LoRA adapter for each individual document. This scheme introduces two major limitations: 1) As the number of documents increases, there will be a prohibitive cost for training and storage. 2) The LoRA adapters may largely overlap due to the shared knowledge across documents, making the approach highly inefficient. To overcome these challenges, we propose the Poly-PRAG approach, which uses a small set of LoRA adapters that are able to encode more general knowledge. Each document can be encoded using a combination of them through a latent routing function. By jointly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Handwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis
