Inference Scaling for Bridging Retrieval and Augmented Generation

Youngwon Lee; Seung-won Hwang; Daniel Campos; Filip Grali\'nski,; Zhewei Yao; Yuxiong He

arXiv:2412.10684·cs.CL·December 17, 2024

Inference Scaling for Bridging Retrieval and Augmented Generation

Youngwon Lee, Seung-won Hwang, Daniel Campos, Filip Grali\'nski,, Zhewei Yao, Yuxiong He

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel inference scaling method called Mixture-of-Intervention (MOI) that mitigates generator bias in retrieval-augmented generation, improving performance on multiple benchmarks by aggregating multiple inference passes.

Contribution

The paper proposes MOI, a new inference technique that reduces bias in RAG models and leverages retriever knowledge to enhance efficiency and accuracy.

Findings

01

Improves ROUGE-L on MS MARCO by ~7 points.

02

Enhances EM on HotpotQA by ~7 points.

03

Reduces computational cost through optimized permutation strategies.

Abstract

Retrieval-augmented generation (RAG) has emerged as a popular approach to steering the output of a large language model (LLM) by incorporating retrieved contexts as inputs. However, existing work observed the generator bias, such that improving the retrieval results may negatively affect the outcome. In this work, we show such bias can be mitigated, from inference scaling, aggregating inference calls from the permuted order of retrieved contexts. The proposed Mixture-of-Intervention (MOI) explicitly models the debiased utility of each passage with multiple forward passes to construct a new ranking. We also show that MOI can leverage the retriever's prior knowledge to reduce the computational cost by minimizing the number of permutations considered and lowering the cost per LLM call. We showcase the effectiveness of MOI on diverse RAG tasks, improving ROUGE-L on MS MARCO and EM on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Inference Scaling for Bridging Retrieval and Augmented Generation· underline

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Attention Is All You Need · Dense Connections · Byte Pair Encoding · Multi-Head Attention · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay