Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Youan Cong; Pritom Saha Akash; Cheng Wang; Kevin Chen-Chuan Chang

arXiv:2411.07820·cs.CL·September 22, 2025

Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Youan Cong, Pritom Saha Akash, Cheng Wang, Kevin Chen-Chuan Chang

PDF

Open Access

TL;DR

The paper presents the ERRR framework, a novel query optimization method for retrieval-augmented large language models that improves accuracy and efficiency by extracting and refining knowledge-specific queries.

Contribution

It introduces a new extract-refine-retrieve-read pipeline with a trainable, distillation-based query optimizer for enhanced RAG system performance.

Findings

01

ERRR outperforms existing baselines on QA datasets.

02

The trainable query optimizer reduces computational costs.

03

ERRR improves retrieval relevance and response accuracy.

Abstract

We introduce the \textit{Extract-Refine-Retrieve-Read} (ERRR) framework, a novel approach designed to bridge the pre-retrieval information gap in Retrieval-Augmented Generation (RAG) systems through query optimization tailored to meet the specific knowledge requirements of Large Language Models (LLMs). Unlike conventional query optimization techniques used in RAG, the ERRR framework begins by extracting parametric knowledge from LLMs, followed by using a specialized query optimizer for refining these queries. This process ensures the retrieval of only the most pertinent information essential for generating accurate responses. Moreover, to enhance flexibility and reduce computational costs, we propose a trainable scheme for our pipeline that utilizes a smaller, tunable model as the query optimizer, which is refined through knowledge distillation from a larger teacher model. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Linear Warmup With Linear Decay · WordPiece · Dense Connections · Layer Normalization · Adam · Attention Dropout