Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches
Frank Havemann, Jochen Gl\"aser, Michael Heinz, and Alexander Struck

TL;DR
This paper compares three algorithms for detecting overlapping and hierarchical themes in networks of scholarly papers, evaluating their effectiveness against content-based categorization.
Contribution
It provides a comparative analysis of three recent approaches to identify thematic structures in scholarly paper networks, highlighting their strengths and limitations.
Findings
All three algorithms successfully identified major thematic branches.
Overlapping topics were detected differently by each approach.
The methods varied in accuracy and interpretability.
Abstract
We implemented three recently proposed approaches to the identification of overlapping and hierarchical substructures in graphs and applied the corresponding algorithms to a network of 492 information-science papers coupled via their cited sources. The thematic substructures obtained and overlaps produced by the three hierarchical cluster algorithms were compared to a content-based categorisation, which we based on the interpretation of titles and keywords. We defined sets of papers dealing with three topics located on different levels of aggregation: h-index, webometrics, and bibliometrics. We identified these topics with branches in the dendrograms produced by the three cluster algorithms and compared the overlapping topics they detected with one another and with the three pre-defined paper sets. We discuss the advantages and drawbacks of applying the three approaches to paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
