Co-factor analysis of citation networks
Alex Hayes, Karl Rohe

TL;DR
This paper introduces a novel co-factor embedding method for citation networks, enabling the analysis of how papers cite and are cited, overcoming challenges posed by the asymmetric and incomplete nature of citation data.
Contribution
We develop a co-factor model for asymmetric citation matrices with missing data, framing estimation as a matrix completion problem, and apply it to analyze a comprehensive statistics literature dataset.
Findings
Identified interpretable co-factors corresponding to statistical subfields
Demonstrated the effectiveness of the estimator through simulations
Produced the most comprehensive topic model of the statistics literature to date
Abstract
One compelling use of citation networks is to characterize papers by their relationships to the surrounding literature. We propose a method to characterize papers by embedding them into two distinct "co-factor" spaces: one describing how papers send citations, and the other describing how papers receive citations. This approach presents several challenges. First, older documents cannot cite newer documents, and thus it is not clear that co-factors are even identifiable. We resolve this challenge by developing a co-factor model for asymmetric adjacency matrices with missing lower triangles and showing that identification is possible. We then frame estimation as a matrix completion problem and develop a specialized implementation of matrix completion because prior implementations are memory bound in our setting. Simulations show that our estimator has promising finite sample properties,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
