Self-citation Analysis using Sentence Embeddings
Athanasios Lagopoulos, Grigorios Tsoumakas

TL;DR
This paper employs sentence embeddings to analyze journal self-citations in PubMed Central articles, aiming to differentiate between legitimate and unethical self-citations based on publication similarity.
Contribution
It introduces a large-scale, similarity-based approach to assess the legitimacy of self-citations in scientific articles.
Findings
Identified patterns distinguishing justifiable from unethical self-citations
Demonstrated the effectiveness of sentence embeddings in citation analysis
Provided insights into self-citation practices since 1990
Abstract
The purpose of citation indexes and metrics is intended to be a measure for scientific innovation and quality for researchers, journals, and institutions. However, those metrics are often prone to abuse and manipulation by excessive and unethical self-citations induced by authors, reviewers, editors, or journals. Identifying whether there are or not legitimate reasons for self-citations is normally determined during the review process, where the participating parts may have intrinsic incentives, rendering the legitimacy of self-citations, after publication, questionable. In this paper, we conduct a large-scale analysis of journal self-citations while taking into consideration the similarity between a publication and its references. Specifically, we look into PubMed Central articles published since 1990 and compute similarities of article-reference pairs using sentence embeddings. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
