Sampling for Approximate Bipartite Network Projection
Nesreen K. Ahmed, Nick Duffield, Liangzhen Xia

TL;DR
This paper introduces a novel sampling algorithm for efficiently estimating the similarity matrix in bipartite network projections, enabling scalable analysis of large streaming data with high accuracy.
Contribution
The paper proposes a fixed-size, unbiased sampling method that preferentially samples high-similarity node pairs in bipartite streams, improving estimation efficiency.
Findings
Achieves about 1% weighted relative error with only 10% sampling.
Effectively estimates high-similarity edges in real-world bipartite graphs.
Demonstrates scalability and accuracy in streaming bipartite network analysis.
Abstract
Bipartite networks manifest as a stream of edges that represent transactions, e.g., purchases by retail customers. Many machine learning applications employ neighborhood-based measures to characterize the similarity among the nodes, such as the pairwise number of common neighbors (CN) and related metrics. While the number of node pairs that share neighbors is potentially enormous, only a relatively small proportion of them have many common neighbors. This motivates finding a weighted sampling approach to preferentially sample these node pairs. This paper presents a new sampling algorithm that provides a fixed size unbiased estimate of the similarity matrix resulting from a bipartite graph stream projection. The algorithm has two components. First, it maintains a reservoir of sampled bipartite edges with sampling weights that favor selection of high similarity nodes. Second, arriving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Caching and Content Delivery · Recommender Systems and Techniques
