CiteME: Can Language Models Accurately Cite Scientific Claims?

Ori Press; Andreas Hochlehnert; Ameya Prabhu; Vishaal Udandarao; Ofir; Press; Matthias Bethge

arXiv:2407.12861·cs.CL·November 5, 2024·3 cites

CiteME: Can Language Models Accurately Cite Scientific Claims?

Ori Press, Andreas Hochlehnert, Ameya Prabhu, Vishaal Udandarao, Ofir, Press, Matthias Bethge

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces CiteME, a benchmark for evaluating language models' ability to accurately cite scientific claims, revealing significant gaps between current models and human performance, and proposes CiteAgent to improve citation accuracy.

Contribution

The paper presents CiteME, a new benchmark for citation attribution in scientific texts, and introduces CiteAgent, an autonomous system that improves model performance in this task.

Findings

01

LMs achieve only 4.2-18.5% accuracy on CiteME.

02

Humans achieve 69.7% accuracy on the same task.

03

CiteAgent improves accuracy to 35.3%.

Abstract

Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research question: Given a text excerpt referencing a paper, could an LM act as a research assistant to correctly identify the referenced paper? We advance efforts to answer this question by building a benchmark that evaluates the abilities of LMs in citation attribution. Our benchmark, CiteME, consists of text excerpts from recent machine learning papers, each referencing a single other paper. CiteME use reveals a large gap between frontier LMs and human performance, with LMs achieving only 4.2-18.5% accuracy and humans 69.7%. We close this gap by introducing CiteAgent, an autonomous system built on the GPT-4o LM that can also search and read papers,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bethgelab/CiteME
noneOfficial

Datasets

bethgelab/CiteME
dataset· 7 dl
7 dl

Videos

CiteME: Can Language Models Accurately Cite Scientific Claims?· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques