Evaluating and Enhancing Large Language Models for Novelty Assessment in Scholarly Publications
Ethan Lin, Zhiyuan Peng, Yi Fang

TL;DR
This paper introduces a new benchmark, SchNovel, to evaluate large language models' ability to assess scholarly paper novelty, and proposes RAG-Novelty, a retrieval-based method that outperforms baselines in this task.
Contribution
The paper presents the first scholarly novelty benchmark (SchNovel) and a retrieval-augmented method (RAG-Novelty) for assessing novelty in academic publications.
Findings
RAG-Novelty outperforms baseline models in novelty assessment.
Large language models show varying capabilities in evaluating scholarly novelty.
SchNovel provides a new standardized dataset for future research in this area.
Abstract
Recent studies have evaluated the creativity/novelty of large language models (LLMs) primarily from a semantic perspective, using benchmarks from cognitive science. However, accessing the novelty in scholarly publications is a largely unexplored area in evaluating LLMs. In this paper, we introduce a scholarly novelty benchmark (SchNovel) to evaluate LLMs' ability to assess novelty in scholarly papers. SchNovel consists of 15000 pairs of papers across six fields sampled from the arXiv dataset with publication dates spanning 2 to 10 years apart. In each pair, the more recently published paper is assumed to be more novel. Additionally, we propose RAG-Novelty, which simulates the review process taken by human reviewers by leveraging the retrieval of similar papers to assess novelty. Extensive experiments provide insights into the capabilities of different LLMs to assess novelty and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBiomedical Text Mining and Ontologies · scientometrics and bibliometrics research · Topic Modeling
