Fine Grained Citation Span for References in Wikipedia
Besnik Fetahu, Katja Markert, Avishek Anand

TL;DR
This paper introduces a novel method for identifying the exact text segments in Wikipedia articles that are supported by specific citations, enhancing verifiability and completeness of references.
Contribution
It is the first to address the problem of determining citation spans in Wikipedia, proposing a sequence classification approach for fine-grained citation coverage detection.
Findings
Improved citation span detection accuracy over baselines
Effective fine-grained classification of citation coverage
Enhanced verifiability in Wikipedia articles
Abstract
\emph{Verifiability} is one of the core editing principles in Wikipedia, editors being encouraged to provide citations for the added content. For a Wikipedia article, determining the \emph{citation span} of a citation, i.e. what content is covered by a citation, is important as it helps decide for which content citations are still missing. We are the first to address the problem of determining the \emph{citation span} in Wikipedia articles. We approach this problem by classifying which textual fragments in an article are covered by a citation. We propose a sequence classification approach where for a paragraph and a citation, we determine the citation span at a fine-grained level. We provide a thorough experimental evaluation and compare our approach against baselines adopted from the scientific domain, where we show improvement for all evaluation metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
