NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment

Wenqing Wu; Yi Zhao; Yuzhuo Wang; Siyou Li; Juexi Shao; Yunfei Long; Chengzhi Zhang

arXiv:2604.11543·cs.CL·April 14, 2026

NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment

Wenqing Wu, Yi Zhao, Yuzhuo Wang, Siyou Li, Juexi Shao, Yunfei Long, Chengzhi Zhang

PDF

TL;DR

NovBench is a large-scale benchmark designed to evaluate large language models' ability to assess research novelty, highlighting current models' limited understanding and the need for improved fine-tuning.

Contribution

The paper introduces NovBench, the first dedicated benchmark for evaluating LLMs' performance in scientific novelty assessment with a comprehensive evaluation framework.

Findings

01

Current LLMs show limited understanding of scientific novelty.

02

Fine-tuned models often struggle with instruction-following.

03

The benchmark reveals significant room for improvement in LLMs' novelty evaluation capabilities.

Abstract

Novelty is a core requirement in academic publishing and a central focus of peer review, yet the growing volume of submissions has placed increasing pressure on human reviewers. While large language models (LLMs), including those fine-tuned on peer review data, have shown promise in generating review comments, the absence of a dedicated benchmark has limited systematic evaluation of their ability to assess research novelty. To address this gap, we introduce NovBench, the first large-scale benchmark designed to evaluate LLMs' capability to generate novelty evaluations in support of human peer review. NovBench comprises 1,684 paper-review pairs from a leading NLP conference, including novelty descriptions extracted from paper introductions and corresponding expert-written novelty evaluations. We focus on both sources because the introduction provides a standardized and explicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.