Efficient Test-Time Retrieval Augmented Generation

Hailong Yin; Bin Zhu; Jingjing Chen; Chong-Wah Ngo

arXiv:2511.01059·cs.AI·November 4, 2025

Efficient Test-Time Retrieval Augmented Generation

Hailong Yin, Bin Zhu, Jingjing Chen, Chong-Wah Ngo

PDF

Open Access

TL;DR

ET2RAG is a training-free, efficient retrieval-augmented framework that enhances large language models' accuracy by retrieving relevant documents, generating diverse responses, and selecting the best via majority voting, balancing performance and computational cost.

Contribution

The paper introduces ET2RAG, a novel training-free method that improves LLM performance using retrieval, partial generation, and majority voting to balance efficiency and accuracy.

Findings

01

Significantly improves performance on question answering, recipe generation, and image captioning.

02

Reduces computational cost by using partial responses for consensus.

03

Effective across multiple tasks without additional training.

Abstract

Although Large Language Models (LLMs) demonstrate significant capabilities, their reliance on parametric knowledge often leads to inaccuracies. Retrieval Augmented Generation (RAG) mitigates this by incorporating external knowledge, but these methods may introduce irrelevant retrieved documents, leading to inaccurate responses. While the integration methods filter out incorrect answers from multiple responses, but lack external knowledge like RAG methods, and their high costs require balancing overhead with performance gains. To address these issues, we propose an Efficient Test-Time Retrieval-Augmented Generation Framework named ET2RAG to improve the performance of LLMs while maintaining efficiency. Specifically, ET2RAG is a training-free method, that first retrieves the most relevant documents and augments the LLMs to efficiently generate diverse candidate responses by managing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning