CiteBench: A benchmark for Scientific Citation Text Generation
Martin Funkquist, Ilia Kuznetsov, Yufang Hou, Iryna Gurevych

TL;DR
CiteBench is a comprehensive benchmark that unifies various datasets to evaluate and advance scientific citation text generation models, facilitating systematic research and comparison across different task settings and domains.
Contribution
The paper introduces CiteBench, a standardized benchmark for citation text generation that consolidates multiple datasets and provides a framework for consistent evaluation.
Findings
Strong baseline performances analyzed
Transferability of models across datasets tested
Insights into task design and evaluation provided
Abstract
Science progresses by building upon the prior body of knowledge documented in scientific publications. The acceleration of research makes it hard to stay up-to-date with the recent developments and to summarize the ever-growing body of prior work. To address this, the task of citation text generation aims to produce accurate textual summaries given a set of papers-to-cite and the citing paper context. Due to otherwise rare explicit anchoring of cited documents in the citing paper, citation text generation provides an excellent opportunity to study how humans aggregate and synthesize textual knowledge from sources. Yet, existing studies are based upon widely diverging task definitions, which makes it hard to study this task systematically. To address this challenge, we propose CiteBench: a benchmark for citation text generation that unifies multiple diverse datasets and enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Biomedical Text Mining and Ontologies
MethodsTest
