Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases
Ye Yuan, Guoren Wang, Lei Chen, Haixun Wang

TL;DR
This paper addresses the challenge of subgraph similarity search in large probabilistic graphs with correlated edges, proposing a filter-and-verify framework that improves efficiency through probabilistic bounds and sampling, validated by experiments.
Contribution
It introduces a novel filter-and-verify framework for subgraph similarity search on correlated probabilistic graphs, including probabilistic bounds and sampling algorithms, addressing a complex #P-complete problem.
Findings
The proposed method significantly reduces search time.
Probabilistic bounds effectively prune candidate graphs.
Sampling algorithm accurately verifies subgraph similarity.
Abstract
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Advanced Graph Neural Networks · Data Quality and Management
