TL;DR
This paper introduces an active learning framework for pairwise ranking with noisy judgments, improving ranking quality under call constraints by reducing bias and cost.
Contribution
It reframes PRP reranking as an active learning problem, proposing a noise-robust method with a randomized-direction oracle that enhances ranking accuracy.
Findings
Active rankers improve NDCG@10 in call-constrained settings.
The randomized-direction oracle reduces systematic bias to zero-mean noise.
The approach requires only a single LLM call per pair, lowering costs.
Abstract
Pairwise Ranking Prompting (PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet a call budget does not produce a dependable top-K. We thus reframe PRP reranking as active learning from noisy pairwise comparisons and show that active rankers are drop-in replacements that improve NDCG@10 per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematic position bias into zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
