FRESCO: Benchmarking and Optimizing Re-rankers for Evolving Semantic Conflict in Retrieval-Augmented Generation

Sohyun An (1; 2); Hayeon Lee (1); Shuibenyang Yuan (1); Chun-cheng Jason Chen (1); Cho-Jui Hsieh (2); Vijai Mohan (1); Alexander Min (1) ((1) Meta Superintelligence Labs; (2) UCLA)

arXiv:2604.14227·cs.IR·April 17, 2026

FRESCO: Benchmarking and Optimizing Re-rankers for Evolving Semantic Conflict in Retrieval-Augmented Generation

Sohyun An (1, 2), Hayeon Lee (1), Shuibenyang Yuan (1), Chun-cheng Jason Chen (1), Cho-Jui Hsieh (2), Vijai Mohan (1), Alexander Min (1) ((1) Meta Superintelligence Labs, (2) UCLA)

PDF

TL;DR

FRESCO introduces a benchmark for evaluating re-rankers in dynamic, evolving information contexts, revealing biases and proposing instruction-based improvements to better prioritize recent evidence in retrieval-augmented generation.

Contribution

The paper presents FRESCO, a novel benchmark for assessing re-rankers in temporally evolving scenarios, and proposes an instruction optimization framework to improve their recency bias.

Findings

01

Existing re-rankers favor older, semantically rich documents over recent, factual evidence.

02

FRESCO effectively tests re-rankers' ability to prioritize recent information in dynamic settings.

03

Instruction optimization yields up to 27% improvement on Evolving Knowledge tasks.

Abstract

Retrieval-Augmented Generation (RAG) is a key approach to mitigating the temporal staleness of large language models (LLMs) by grounding responses in up-to-date evidence. Within the RAG pipeline, re-rankers play a pivotal role in selecting the most useful documents from retrieved candidates. However, existing benchmarks predominantly evaluate re-rankers in static settings and do not adequately assess performance under evolving information -- a critical gap, as real-world systems often must choose among temporally different pieces of evidence. To address this limitation, we introduce FRESCO (Factual Recency and Evolving Semantic COnflict), a benchmark for evaluating re-rankers in temporally dynamic contexts. By pairing recency-seeking queries with historical Wikipedia revisions, FRESCO tests whether re-rankers can prioritize factually recent evidence while maintaining semantic relevance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.