Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains
Yash Saxena, Ankur Padia, Mandar S Chaudhary, Kalpa Gunaratna, Srinivasan Parthasarathy, Manas Gaur

TL;DR
METEORA introduces a rationale-driven selection framework for RAG in sensitive domains, enhancing interpretability, robustness, and accuracy by replacing traditional similarity-based retrieval with explicit reasoning and adaptive evidence filtering.
Contribution
It presents a novel three-stage approach that uses preference-tuned LLMs, evidence selection with elbow detection, and evidence verification to improve RAG performance and safety.
Findings
13.41% higher recall over baselines
21.05% higher precision without evidence expansion
33.34% improvement in downstream answer accuracy
Abstract
In sensitive domains, Retrieval-Augmented Generation (RAG) must be interpretable and robust because errors do not just mislead, they invite lawsuits, undermine scholarly credibility, and breach compliance. Stakeholders require traceable evidence, clear rationales for why specific evidence is selected, and safeguards against poisoned or misleading content. Yet current RAG pipelines rely on similarity-based retrieval with arbitrary top-k cutoffs, provide no explanation for selections, and remain vulnerable to poisoning attacks. We propose METEORA, which replaces these drawbacks with rationale-driven selection, using explicit reasoning to guide evidence choice, explain decisions, and improve robustness to RAG poisoning. METEORA operates in three stages: (1) a general-purpose LLM is preference-tuned to generate query-conditioned rationales using direct preference optimization; (2) these…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
1. Use of rationales to both select evidence and to explain that selection clearly to the user. Applying dpo to optimize rationale generation is also interesting. 2. The paper proposes a set of practical optimizations that could be applied to any RAG pipeline that I believe would likely lead to improvements in overall performance. This is a valuable contribution.
1. I see that a preference tuned LlaMA-3.1-8b was used for rationale generation and evidence verification in the experiments. Is this LLM available for general use? I didn't see reference to it in the repo. 2. Does METEORA have an sdk that can be used to interface with the framework? 3. As I understand the rationale includes flagging instructions, I would assume that inclusion of flagging instructions may affect the quality of evidence selection? I dont see ablation study that addresses this.
It offers a fresh perspective on re-ranking by grounding it in explicit reasoning — using a rationale generator to make the evidence selection process more transparent and auditable, providing a cleaner and more interpretable alternative to standard top-k retrieval.
1. While METEORA shows clear performance gains, the paper doesn’t provide much concrete analysis or evidence to quantify its initial claims around improved interpretability and credibility. It would be helpful to see a more systematic evaluation of these aspects. 2. Methodologically, METEORA feels more like an engineering refinement built on existing techniques. The main novelty seems to be the unsupervised evidence selection approach, but its real advantages aren’t clearly demonstrated — the p
1. Replaces opaque top-k heuristics with rationale-driven selection; the same rationale frame powers selection and verification, improving auditability for sensitive domains. 2. Easy to implement and tune. 3. Breadth of evaluation, three tasks across six long-document datasets
1. In CP, baselines are evaluated at METEORA’s average evidence count rather than each method’s own best-K; a full K-sweep with per-method optima would be more standard and may change relative standings. 2. Design-wise, selection and verification share the same rationale frame, which authors themselves tag as a single-point-of-failure risk under targeted attacks; some verifier/model heterogeneity would help. 3. FinQA case shows lower recall than re-rankers in short-passage settings; the paper
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Multi-Criteria Decision Making · AI-based Problem Solving and Planning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Attention Dropout · Softmax · WordPiece · Weight Decay · Dropout · Adam · Linear Layer
