RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

Yelin Chen; Fanjin Zhang; Suping Sun; Yunhe Pang; Yuanchun Wang; Jian Song; Xiaoyan Li; Lei Hou; Shu Zhao; Jie Tang; Juanzi Li

arXiv:2601.14289·cs.CL·May 1, 2026

RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

Yelin Chen, Fanjin Zhang, Suping Sun, Yunhe Pang, Yuanchun Wang, Jian Song, Xiaoyan Li, Lei Hou, Shu Zhao, Jie Tang, Juanzi Li

PDF

1 Repo

TL;DR

RPC-Bench is a large-scale, fine-grained QA benchmark for research paper comprehension, revealing significant gaps in current models' ability to understand scholarly content accurately.

Contribution

It introduces a novel benchmark with a detailed taxonomy and an LLM-human interaction framework for evaluating scientific understanding.

Findings

01

Even GPT-5 achieves only 68.2% correctness-completeness.

02

Model performance drops to 37.46% after conciseness adjustment.

03

RPC-Bench exposes substantial gaps in current scientific paper comprehension.

Abstract

Understanding research papers remains challenging for foundation models due to specialized scientific discourse and complex figures and tables, yet existing benchmarks offer limited fine-grained evaluation at scale. To address this gap, we introduce RPC-Bench, a large-scale question-answering benchmark built from review-rebuttal exchanges of high-quality computer science papers, containing 15K human-verified QA pairs. We design a fine-grained taxonomy aligned with the scientific research flow to assess models' ability to understand and answer why, what, and how questions in scholarly contexts. We also define an elaborate LLM-human interaction annotation framework to support large-scale labeling and quality control. Following the LLM-as-a-Judge paradigm, we develop a scalable framework that evaluates models on correctness-completeness and conciseness, with high agreement to human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://rpc-bench.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.