# ResearchQA: Evaluating Scholarly Question Answering at Scale Across 75 Fields with Survey-Mined Questions and Rubrics

**Authors:** Li S. Yifei, Allen Chang, Chaitanya Malaviya, Mark Yatskar

arXiv: 2509.00496 · 2025-12-22

## TL;DR

ResearchQA introduces a large-scale dataset derived from survey articles across 75 fields to evaluate scholarly question answering systems, highlighting current system limitations in coverage and answer quality.

## Contribution

The paper presents ResearchQA, a novel resource for evaluating research-focused question answering systems using survey-mined questions and rubrics across multiple disciplines.

## Key findings

- No system exceeds 70% coverage of rubric items
- The top system fully addresses less than 11% of citation rubrics
- Error analysis identifies key areas for improvement in system responses

## Abstract

Evaluating long-form responses to research queries heavily relies on expert annotators, restricting attention to areas like AI where researchers can conveniently enlist colleagues. Yet, research expertise is abundant: survey articles consolidate knowledge spread across the literature. We introduce ResearchQA, a resource for evaluating LLM systems by distilling survey articles from 75 research fields into 21K queries and 160K rubric items. Queries and rubrics are jointly derived from survey sections, where rubric items list query-specific answer evaluation criteria, i.e., citing papers, making explanations, and describing limitations. 31 Ph.D. annotators in 8 fields judge that 90% of queries reflect Ph.D. information needs and 87% of rubric items warrant emphasis of a sentence or longer. We leverage ResearchQA to evaluate 18 systems in 7.6K head-to-heads. No parametric or retrieval-augmented system we evaluate exceeds 70% on covering rubric items, and the highest-ranking system shows 75% coverage. Error analysis reveals that the highest-ranking system fully addresses less than 11% of citation rubric items, 48% of limitation items, and 49% of comparison items. We release our data to facilitate more comprehensive multi-field evaluations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00496/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00496/full.md

---
Source: https://tomesphere.com/paper/2509.00496