MSQ-Index: A Succinct Index for Fast Graph Similarity Search

Xiaoyang Chen; Hongwei Huo; Jun Huan; Jeffrey Scott Vitter

arXiv:1612.09155·cs.DB·December 30, 2016·2 cites

MSQ-Index: A Succinct Index for Fast Graph Similarity Search

Xiaoyang Chen, Hongwei Huo, Jun Huan, Jeffrey Scott Vitter

PDF

Open Access

TL;DR

This paper introduces MSQ-Index, a space-efficient in-memory graph similarity search index that significantly reduces memory usage and accelerates query times for large graph databases, especially in bioinformatics.

Contribution

The paper presents a novel succinct index structure based on q-gram trees with hybrid encoding, enabling scalable, fast graph similarity search on massive datasets.

Findings

01

Uses only 5%-15% of previous index size

02

Achieves 2-3 times faster query performance

03

Successfully scales to 25 million graphs in PubChem

Abstract

Graph similarity search has received considerable attention in many applications, such as bioinformatics, data mining, pattern recognition, and social networks. Existing methods for this problem have limited scalability because of the huge amount of memory they consume when handling very large graph databases with millions or billions of graphs. In this paper, we study the problem of graph similarity search under the graph edit distance constraint. We present a space-efficient index structure based upon the q-gram tree that incorporates succinct data structures and hybrid encoding to achieve improved query time performance with minimal space usage. Specifically, the space usage of our index requires only 5%-15% of the previous state-of-the-art indexing size on the tested data while at the same time achieving 2-3 times acceleration in query time with small data sets. We also boost the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Algorithms and Data Compression · Data Management and Algorithms