Targeted sampling from massive block model graphs with personalized   PageRank

Fan Chen; Yini Zhang; and Karl Rohe

arXiv:1910.12937·cs.SI·July 2, 2020

Targeted sampling from massive block model graphs with personalized PageRank

Fan Chen, Yini Zhang, and Karl Rohe

PDF

1 Repo

TL;DR

This paper analyzes personalized PageRank (PPR) for sampling and community detection in large graphs, proposing adjustments to improve localization and bias correction, with theoretical guarantees and real-world Twitter data application.

Contribution

It provides a theoretical framework for PPR in massive graphs, introduces degree-based bias correction, and demonstrates practical effectiveness on large social networks.

Findings

01

PPR can effectively identify communities starting from a seed node.

02

Degree adjustment improves localization and reduces bias.

03

The method is validated on Twitter's massive friendship graph.

Abstract

The paper provides statistical theory and intuition for personalized PageRank (called "PPR"): a popular technique that samples a small community from a massive network. We study a setting where the entire network is expensive to obtain thoroughly or to maintain, but we can start from a seed node of interest and "crawl" the network to find other nodes through their connections. By crawling the graph in a designed way, the PPR vector can be approximated without querying the entire massive graph, making it an alternative to snowball sampling. Using the degree-corrected stochastic block model, we study whether the PPR vector can select nodes that belong to the same block as the seed node. We provide a simple and interpretable form for the PPR vector, highlighting its biases towards high degree nodes outside the target block. We examine a simple adjustment based on node degrees and establish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RoheLab/aPPR
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.