Cited Text Spans for Citation Text Generation
Xiangci Li, Yi-Hui Lee, Jessica Ouyang

TL;DR
This paper introduces a citation text generation method that conditions on cited text spans instead of abstracts, using distant labeling and keyword retrieval to improve factual grounding and practicality.
Contribution
It proposes a novel approach to citation generation by focusing on cited text spans and introduces methods for automatic CTS annotation and retrieval.
Findings
Conditioning on CTS improves factual accuracy.
Distant labeling achieves strong performance with less annotation effort.
Keyword-based retrieval makes full-text grounded citation generation practical.
Abstract
An automatic citation generation system aims to concisely and accurately describe the relationship between two scientific articles. To do so, such a system must ground its outputs to the content of the cited paper to avoid non-factual hallucinations. Due to the length of scientific documents, existing abstractive approaches have conditioned only on cited paper abstracts. We demonstrate empirically that the abstract is not always the most appropriate input for citation generation and that models trained in this way learn to hallucinate. We propose to condition instead on the cited text span (CTS) as an alternative to the abstract. Because manual CTS annotation is extremely time- and labor-intensive, we experiment with distant labeling of candidate CTS sentences, achieving sufficiently strong performance to substitute for expensive human annotations in model training, and we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies
