Ground truth? Concept-based communities versus the external classification of physics manuscripts
Vasyl Palchykov, Valerio Gemmetto, Alexey Boyarsky, Diego, Garlaschelli

TL;DR
This study compares community detection results on scientific publication networks with expert-made classifications, revealing discrepancies that highlight interdisciplinary overlaps and methodological similarities not captured by external labels.
Contribution
It demonstrates that external classifications may not fully represent the thematic structure of scientific communities, emphasizing the importance of analyzing intrinsic network-based communities.
Findings
Community detection partly aligns with expert classifications
Discrepancies reveal interdisciplinary overlaps
Methodological similarities are often overlooked
Abstract
Community detection techniques are widely used to infer hidden structures within interconnected systems. Despite demonstrating high accuracy on benchmarks, they reproduce the external classification for many real-world systems with a significant level of discrepancy. A widely accepted reason behind such outcome is the unavoidable loss of non-topological information (such as node attributes) encountered when the original complex system is represented as a network. In this article we emphasize that the observed discrepancies may also be caused by a different reason: the external classification itself. For this end we use scientific publication data which i) exhibit a well defined modular structure and ii) hold an expert-made classification of research articles. Having represented the articles and the extracted scientific concepts both as a bipartite network and as its unipartite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
