TL;DR
This paper analyzes personalized PageRank (PPR) for sampling and community detection in large graphs, proposing adjustments to improve localization and bias correction, with theoretical guarantees and real-world Twitter data application.
Contribution
It provides a theoretical framework for PPR in massive graphs, introduces degree-based bias correction, and demonstrates practical effectiveness on large social networks.
Findings
PPR can effectively identify communities starting from a seed node.
Degree adjustment improves localization and reduces bias.
The method is validated on Twitter's massive friendship graph.
Abstract
The paper provides statistical theory and intuition for personalized PageRank (called "PPR"): a popular technique that samples a small community from a massive network. We study a setting where the entire network is expensive to obtain thoroughly or to maintain, but we can start from a seed node of interest and "crawl" the network to find other nodes through their connections. By crawling the graph in a designed way, the PPR vector can be approximated without querying the entire massive graph, making it an alternative to snowball sampling. Using the degree-corrected stochastic block model, we study whether the PPR vector can select nodes that belong to the same block as the seed node. We provide a simple and interpretable form for the PPR vector, highlighting its biases towards high degree nodes outside the target block. We examine a simple adjustment based on node degrees and establish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
