Space-Efficient Sampling from Social Activity Streams
Nesreen K. Ahmed, Jennifer Neville, Ramana Kompella

TL;DR
This paper introduces a space-efficient streaming algorithm for sampling representative subgraphs from large, dynamic network data streams, outperforming existing methods in preserving original graph properties.
Contribution
It presents a novel streaming graph sampling algorithm that maintains a representative subgraph in a reservoir setting for large-scale, dynamic networks.
Findings
Samples better preserve original graph distributions.
Effective on various real-world datasets.
Outperforms existing sampling methods.
Abstract
In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large complex network. Although recent subgraph sampling methods have been shown to work well, they focus on sampling from memory-resident graphs and assume that the sampling algorithm can access the entire graph in order to decide which nodes/edges to select. Many large-scale network datasets, however, are too large and/or dynamic to be processed using main memory (e.g., email, tweets, wall posts). In this work, we formulate the problem of sampling from large graph streams. We propose a streaming graph sampling algorithm that dynamically maintains a representative sample in a reservoir based setting. We evaluate the efficacy of our proposed methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Data Stream Mining Techniques · Peer-to-Peer Network Technologies
