OptiSet: Unified Optimizing Set Selection and Ranking for Retrieval-Augmented Generation

Yi Jiang; Sendong Zhao; Jianbo Li; Bairui Hu; Yanrui Du; Haochun Wang; Bing Qin

arXiv:2601.05027·cs.AI·January 9, 2026

OptiSet: Unified Optimizing Set Selection and Ranking for Retrieval-Augmented Generation

Yi Jiang, Sendong Zhao, Jianbo Li, Bairui Hu, Yanrui Du, Haochun Wang, Bing Qin

PDF

Open Access

TL;DR

OptiSet introduces a unified set-centric framework for retrieval-augmented generation that enhances evidence selection and ranking, leading to improved performance and efficiency in complex tasks.

Contribution

It proposes a novel set-level approach with an

Findings

01

Improves RAG performance on complex problems

02

Enhances evidence set compactness and diversity

03

Increases efficiency of generation process

Abstract

Retrieval-Augmented Generation (RAG) improves generation quality by incorporating evidence retrieved from large external corpora. However, most existing methods rely on statically selecting top-k passages based on individual relevance, which fails to exploit combinatorial gains among passages and often introduces substantial redundancy. To address this limitation, we propose OptiSet, a set-centric framework that unifies set selection and set-level ranking for RAG. OptiSet adopts an "Expand-then-Refine" paradigm: it first expands a query into multiple perspectives to enable a diverse candidate pool and then refines the candidate pool via re-selection to form a compact evidence set. We then devise a self-synthesis strategy without strong LLM supervision to derive preference labels from the set conditional utility changes of the generator, thereby identifying complementary and redundant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Multimodal Machine Learning Applications