Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains

Yash Saxena; Ankur Padia; Mandar S Chaudhary; Kalpa Gunaratna; Srinivasan Parthasarathy; Manas Gaur

arXiv:2505.16014·cs.CL·January 21, 2026

Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains

Yash Saxena, Ankur Padia, Mandar S Chaudhary, Kalpa Gunaratna, Srinivasan Parthasarathy, Manas Gaur

PDF

Open Access 3 Reviews

TL;DR

METEORA introduces a rationale-driven selection framework for RAG in sensitive domains, enhancing interpretability, robustness, and accuracy by replacing traditional similarity-based retrieval with explicit reasoning and adaptive evidence filtering.

Contribution

It presents a novel three-stage approach that uses preference-tuned LLMs, evidence selection with elbow detection, and evidence verification to improve RAG performance and safety.

Findings

01

13.41% higher recall over baselines

02

21.05% higher precision without evidence expansion

03

33.34% improvement in downstream answer accuracy

Abstract

In sensitive domains, Retrieval-Augmented Generation (RAG) must be interpretable and robust because errors do not just mislead, they invite lawsuits, undermine scholarly credibility, and breach compliance. Stakeholders require traceable evidence, clear rationales for why specific evidence is selected, and safeguards against poisoned or misleading content. Yet current RAG pipelines rely on similarity-based retrieval with arbitrary top-k cutoffs, provide no explanation for selections, and remain vulnerable to poisoning attacks. We propose METEORA, which replaces these drawbacks with rationale-driven selection, using explicit reasoning to guide evidence choice, explain decisions, and improve robustness to RAG poisoning. METEORA operates in three stages: (1) a general-purpose LLM is preference-tuned to generate query-conditioned rationales using direct preference optimization; (2) these…

Peer Reviews

Decision·ICLR 2026 Conference Desk Rejected Submission

Reviewer 01Rating 8Confidence 4

Strengths

1. Use of rationales to both select evidence and to explain that selection clearly to the user. Applying dpo to optimize rationale generation is also interesting. 2. The paper proposes a set of practical optimizations that could be applied to any RAG pipeline that I believe would likely lead to improvements in overall performance. This is a valuable contribution.

Weaknesses

1. I see that a preference tuned LlaMA-3.1-8b was used for rationale generation and evidence verification in the experiments. Is this LLM available for general use? I didn't see reference to it in the repo. 2. Does METEORA have an sdk that can be used to interface with the framework? 3. As I understand the rationale includes flagging instructions, I would assume that inclusion of flagging instructions may affect the quality of evidence selection? I dont see ablation study that addresses this.

Reviewer 02Rating 2Confidence 3

Strengths

It offers a fresh perspective on re-ranking by grounding it in explicit reasoning — using a rationale generator to make the evidence selection process more transparent and auditable, providing a cleaner and more interpretable alternative to standard top-k retrieval.

Weaknesses

1. While METEORA shows clear performance gains, the paper doesn’t provide much concrete analysis or evidence to quantify its initial claims around improved interpretability and credibility. It would be helpful to see a more systematic evaluation of these aspects. 2. Methodologically, METEORA feels more like an engineering refinement built on existing techniques. The main novelty seems to be the unsupervised evidence selection approach, but its real advantages aren’t clearly demonstrated — the p

Reviewer 03Rating 4Confidence 3

Strengths

1. Replaces opaque top-k heuristics with rationale-driven selection; the same rationale frame powers selection and verification, improving auditability for sensitive domains. 2. Easy to implement and tune. 3. Breadth of evaluation, three tasks across six long-document datasets

Weaknesses

1. In CP, baselines are evaluated at METEORA’s average evidence count rather than each method’s own best-K; a full K-sweep with per-method optima would be more standard and may change relative standings. 2. Design-wise, selection and verification share the same rationale frame, which authors themselves tag as a single-point-of-failure risk under targeted attacks; some verifier/model heterogeneity would help. 3. FinQA case shows lower recall than re-rankers in short-passage settings; the paper

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Multi-Criteria Decision Making · AI-based Problem Solving and Planning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Attention Dropout · Softmax · WordPiece · Weight Decay · Dropout · Adam · Linear Layer