Panther: Fast Top-k Similarity Search in Large Networks
Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li

TL;DR
This paper introduces a fast, sampling-based algorithm for estimating vertex similarity in large networks, significantly outperforming existing methods in speed while maintaining accuracy.
Contribution
The paper presents a novel random path sampling technique for scalable and accurate similarity estimation in massive networks, with theoretical error bounds and enhanced structural similarity measures.
Findings
Achieves approximately 300x faster similarity search than state-of-the-art methods.
Provides theoretical guarantees on estimation error and confidence.
Demonstrates superior accuracy in applications like identity resolution and structural hole spanner detection.
Abstract
Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention. However, both categories of methods are difficult to scale up to handle large networks (with billions of nodes). In this paper, we propose a sampling method that provably and accurately estimates the similarity between vertices. The algorithm is based on a novel idea of random path, and an extended method is also presented, to enhance the structural similarity when two vertices are completely disconnected. We provide theoretical proofs for the error-bound and confidence of the proposed algorithm. We perform extensive empirical study and show that our algorithm can obtain top-k similar vertices for any vertex in a network approximately 300x faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Advanced Graph Neural Networks
