TL;DR
This paper introduces a computational method to measure how accurately citations reflect original claims, revealing systematic differences based on factors like recency, accessibility, and author characteristics, and highlighting limitations of citation counts.
Contribution
It presents a scalable pipeline to quantify citation fidelity at sentence level across a large dataset, uncovering factors influencing citation accuracy and the 'telephone effect' in scholarly communication.
Findings
Citation fidelity is higher for recent, accessible papers.
Authors with lower H-index and medium-sized teams cite more faithfully.
Low fidelity citations tend to propagate errors through subsequent papers.
Abstract
Academic citations are widely used for evaluating research and tracing knowledge flows. Such uses typically rely on raw citation counts and neglect variability in citation types. In particular, citations can vary in their fidelity as original knowledge from cited studies may be paraphrased, summarized, or reinterpreted, possibly wrongly, leading to variation in how much information changes from cited to citing paper. In this study, we introduce a computational pipeline to quantify citation fidelity at scale. Using full texts of papers, the pipeline identifies citations in citing papers and the corresponding claims in cited papers, and applies supervised models to measure fidelity at the sentence level. Analyzing a large-scale multi-disciplinary dataset of approximately 13 million citation sentence pairs, we find that citation fidelity is higher when authors cite papers that are 1) more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
