Mitigating Preference Leakage via Strict Estimator Separation for Normative Generative Ranking
Dalia Nahhas, Xiaohao Cai, Imran Razzak, Shoaib Jameel

TL;DR
This paper introduces a leakage-free two-judge framework for cultural relevance ranking in Generative Information Retrieval, effectively reducing circularity and preference leakage, and demonstrates its effectiveness on new benchmarks and datasets.
Contribution
It formalizes cultural relevance as a within-query ranking task and proposes a strict separation of supervision and evaluation to mitigate preference leakage.
Findings
A dense bi-encoder distilled from Judge-B-supervised Cross-Encoder is highly effective.
The leakage-free evaluation framework outperforms classical baselines.
Distilled models align well with human norms on curated datasets.
Abstract
In Generative Information Retrieval (GenIR), the bottleneck has shifted from generation to the selection of candidates, particularly for normative criteria such as cultural relevance. Current LLM-as-a-Judge evaluations often suffer from circularity and preference leakage, where overlapping supervision and evaluation models inflate performance. We address this by formalising cultural relevance as a within-query ranking task and introducing a leakage-free two-judge framework that strictly separates supervision (Judge B) from evaluation (Judge A). On a new benchmark of 33,052 (NGR-33k) culturally grounded stories, we find that while classical baselines yield only modest gains, a dense bi-encoder distilled from a Judge-B-supervised Cross-Encoder is highly effective. Although the Cross-Encoder provides a strong supervision signal for distillation, the distilled BGE-M3 model substantially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Recommender Systems and Techniques
