ScienceMeter: Tracking Scientific Knowledge Updates in Language Models
Yike Wang, Shangbin Feng, Yulia Tsvetkov, Hannaneh Hajishirzi

TL;DR
ScienceMeter is a framework for evaluating how well language models update and maintain scientific knowledge over time, highlighting current limitations and the need for more robust update methods.
Contribution
This paper introduces ScienceMeter, a comprehensive evaluation framework for scientific knowledge updates in language models, including new metrics and a large curated dataset.
Findings
Best methods preserve 85.9% of existing knowledge
Models acquire 71.7% of new scientific claims
Performance on objectives is correlated across domains
Abstract
Large Language Models (LLMs) are increasingly used to support scientific research, but their knowledge of scientific advancements can quickly become outdated. We introduce ScienceMeter, a new framework for evaluating scientific knowledge update methods over scientific knowledge spanning the past, present, and future. ScienceMeter defines three metrics: knowledge preservation, the extent to which models' understanding of previously learned papers are preserved; knowledge acquisition, how well scientific claims from newly introduced papers are acquired; and knowledge projection, the ability of the updated model to anticipate or generalize to related scientific claims that may emerge in the future. Using ScienceMeter, we examine the scientific knowledge of LLMs on claim judgment and generation tasks across a curated dataset of 15,444 scientific papers and 30,888 scientific claims from ten…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
