SciRerankBench: Benchmarking Rerankers Towards Scientific Retrieval-Augmented Generated LLMs

Haotian Chen; Qingqing Long; Meng Xiao; Xiao Luo; Wei Ju; Chengrui Wang; Xuezhi Wang; Yuanchun Zhou; Hengshu Zhu

arXiv:2508.08742·cs.CL·September 25, 2025

SciRerankBench: Benchmarking Rerankers Towards Scientific Retrieval-Augmented Generated LLMs

Haotian Chen, Qingqing Long, Meng Xiao, Xiao Luo, Wei Ju, Chengrui Wang, Xuezhi Wang, Yuanchun Zhou, Hengshu Zhu

PDF

Open Access

TL;DR

This paper introduces SciRerankBench, a benchmark for evaluating rerankers in scientific retrieval-augmented LLMs, focusing on noise resilience, relevance, and factual accuracy across multiple scientific domains.

Contribution

It presents the first dedicated benchmark for reranker evaluation in scientific RAG-LLMs, including diverse question-context-answer pairs and systematic analysis of 13 rerankers across five scientific subjects.

Findings

01

Rerankers show varied strengths in noise resilience and relevance disambiguation.

02

Certain rerankers excel in factual consistency but struggle with noisy contexts.

03

Insights guide future development of more robust rerankers for scientific LLM applications.

Abstract

Scientific literature question answering is a pivotal step towards new scientific discoveries. Recently, \textit{two-stage} retrieval-augmented generated large language models (RAG-LLMs) have shown impressive advancements in this domain. Such a two-stage framework, especially the second stage (reranker), is particularly essential in the scientific domain, where subtle differences in terminology may have a greatly negative impact on the final factual-oriented or knowledge-intensive answers. Despite this significant progress, the potential and limitations of these works remain unexplored. In this work, we present a Scientific Rerank-oriented RAG Benchmark (SciRerankBench), for evaluating rerankers within RAG-LLMs systems, spanning five scientific subjects. To rigorously assess the reranker performance in terms of noise resilience, relevance disambiguation, and factual consistency, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques