An Axiomatic Benchmark for Evaluation of Scientific Novelty Metrics
Miri Liu, ChengXiang Zhai

TL;DR
This paper introduces an axiomatic benchmark to evaluate scientific novelty metrics, revealing current metrics' limitations and proposing combined approaches for improvement.
Contribution
It defines axioms for novelty metrics based on scientific norms, evaluates existing metrics against these, and demonstrates that combining diverse metrics enhances performance.
Findings
No existing metric satisfies all axioms consistently.
Combining metrics of different architectures improves evaluation accuracy.
Per-axiom weighted combination achieves 90.1% performance.
Abstract
The rigorous evaluation of the novelty of a scientific paper is, even for human scientists, a challenging task. With the increasing interest in AI scientists and AI involvement in scientific idea generation and paper writing, it also becomes increasingly important that this task be automatable and reliable, lest both human attention and compute tokens be wasted on ideas that have already been explored. Due to the challenge of quantifying ground-truth novelty, however, existing novelty metrics for scientific papers generally validate their results against noisy, confounded signals such as citation counts or peer review scores. These proxies can conflate novelty with impact, quality, or reviewer preference, which in turn makes it harder to assess how well a given metric actually evaluates novelty. We therefore propose an axiomatic benchmark for scientific novelty metrics. We first define…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
