Citation importance-aware document representation learning for large-scale science mapping
Zhentao Liang, Nees Jan van Eck, Xuehua Wu, Jin Mao, Gang Li

TL;DR
This paper introduces a citation importance-aware contrastive learning framework that enhances scientific document representations by differentiating citation significance, leading to improved science mapping accuracy and insights into interdisciplinary research.
Contribution
It proposes a scalable measurement of citation importance and integrates it into contrastive learning to refine document representations for large-scale science mapping.
Findings
Improved document representation quality on benchmark datasets
Enhanced accuracy of science mapping visualizations
Effective differentiation of important versus perfunctory citations
Abstract
Effective science mapping relies on high-quality representations of scientific documents. As an important task in scientometrics and information studies, science mapping is often challenged by the complex and heterogeneous nature of citations. While previous studies have attempted to improve document representations by integrating citation and semantic information, the heterogeneity of citations is often overlooked. To address this problem, this study proposes a citation importance-aware contrastive learning framework that refines the supervisory signal. We first develop a scalable measurement of citation importance based on location, frequency, and self-citation characteristics. Citation importance is then integrated into the contrastive learning process through an importance-aware sampling strategy, which selects low-importance citations as hard negatives. This forces the model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsscientometrics and bibliometrics research · Research Data Management Practices · Biomedical Text Mining and Ontologies
