Semantically Orthogonal Framework for Citation Classification: Disentangling Intent and Content
Changxu Duan, Zhiyin Tan

TL;DR
This paper introduces SOFT, a framework that separates citation intent from cited content type, improving classification accuracy and consistency across disciplines in scholarly datasets.
Contribution
The paper presents SOFT, a novel semantic framework that disentangles citation intent and content type, enhancing annotation reliability and cross-domain applicability.
Findings
SOFT improves agreement between human and AI annotations.
Enhanced classification performance with zero-shot and fine-tuned models.
Better cross-domain generalization over previous frameworks.
Abstract
Understanding the role of citations is essential for research assessment and citation-aware digital libraries. However, existing citation classification frameworks often conflate citation intent (why a work is cited) with cited content type (what part is cited), limiting their effectiveness in auto classification due to a dilemma between fine-grained type distinctions and practical classification reliability. We introduce SOFT, a Semantically Orthogonal Framework with Two dimensions that explicitly separates citation intent from cited content type, drawing inspiration from semantic role theory. We systematically re-annotate the ACL-ARC dataset using SOFT and release a cross-disciplinary test set sampled from ACT2. Evaluation with both zero-shot and fine-tuned Large Language Models demonstrates that SOFT enables higher agreement between human annotators and LLMs, and supports stronger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · scientometrics and bibliometrics research · Information Retrieval and Search Behavior
