CiteME: Can Language Models Accurately Cite Scientific Claims?
Ori Press, Andreas Hochlehnert, Ameya Prabhu, Vishaal Udandarao, Ofir, Press, Matthias Bethge

TL;DR
This paper introduces CiteME, a benchmark for evaluating language models' ability to accurately cite scientific claims, revealing significant gaps between current models and human performance, and proposes CiteAgent to improve citation accuracy.
Contribution
The paper presents CiteME, a new benchmark for citation attribution in scientific texts, and introduces CiteAgent, an autonomous system that improves model performance in this task.
Findings
LMs achieve only 4.2-18.5% accuracy on CiteME.
Humans achieve 69.7% accuracy on the same task.
CiteAgent improves accuracy to 35.3%.
Abstract
Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research question: Given a text excerpt referencing a paper, could an LM act as a research assistant to correctly identify the referenced paper? We advance efforts to answer this question by building a benchmark that evaluates the abilities of LMs in citation attribution. Our benchmark, CiteME, consists of text excerpts from recent machine learning papers, each referencing a single other paper. CiteME use reveals a large gap between frontier LMs and human performance, with LMs achieving only 4.2-18.5% accuracy and humans 69.7%. We close this gap by introducing CiteAgent, an autonomous system built on the GPT-4o LM that can also search and read papers,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
