Is this Idea Novel? An Automated Benchmark for Judgment of Research Ideas
Tim Schopf, Michael F\"arber

TL;DR
This paper introduces RINoBench, a comprehensive benchmark for evaluating automated research idea novelty judgments, revealing that current large language models struggle to match human assessments despite mimicking reasoning.
Contribution
It presents the first large-scale, standardized benchmark for research idea novelty evaluation, enabling consistent comparison of automated methods against human judgments.
Findings
LLMs' reasoning aligns with humans but not accuracy
Automated models often diverge from human novelty judgments
Benchmark facilitates large-scale, standardized evaluation of novelty assessment methods
Abstract
Judging the novelty of research ideas is crucial for advancing science, enabling the identification of unexplored directions, and ensuring contributions meaningfully extend existing knowledge rather than reiterate minor variations. However, given the exponential growth of scientific literature, manually judging the novelty of research ideas through literature reviews is labor-intensive, subjective, and infeasible at scale. Therefore, recent efforts have proposed automated approaches for research idea novelty judgment. Yet, evaluation of these approaches remains largely inconsistent and is typically based on non-standardized human evaluations, hindering large-scale, comparable evaluations. To address this, we introduce RINoBench, the first comprehensive benchmark for large-scale evaluation of research idea novelty judgments. It comprises 1,381 research ideas derived from and judged by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Advanced Text Analysis Techniques
