Leveraging History for Faster Sampling of Online Social Networks
Zhuojie Zhou, Nan Zhang, Gautam Das

TL;DR
This paper introduces novel algorithms that leverage historical data to accelerate random walk sampling in online social networks, reducing burn-in time while maintaining the same stationary distribution for more efficient analytics.
Contribution
It proposes two new algorithms, CNRW and GNRW, that improve sampling efficiency by using history to enhance random walks without altering their stationary distribution.
Findings
CNRW and GNRW outperform baseline random walks in efficiency.
The algorithms maintain the same stationary distribution as traditional methods.
Experimental results on real and synthetic networks validate their effectiveness.
Abstract
How to enable efficient analytics over such data has been an increasingly important research problem. Given the sheer size of such social networks, many existing studies resort to sampling techniques that draw random nodes from an online social network through its restrictive web/API interface. Almost all of them use the exact same underlying technique of random walk - a Markov Chain Monte Carlo based method which iteratively transits from one node to its random neighbor. Random walk fits naturally with this problem because, for most online social networks, the only query we can issue through the interface is to retrieve the neighbors of a given node (i.e., no access to the full graph topology). A problem with random walks, however, is the "burn-in" period which requires a large number of transitions/queries before the sampling distribution converges to a stationary value that enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Human Mobility and Location-Based Analysis · Caching and Content Delivery
