Regular Expression Matching on billion-nodes Graphs
Hongzhi Wang, Jiabao Han, Bin Shao, Jianzhong Li

TL;DR
This paper introduces a scalable method for processing regular expression queries on billion-node graphs, including social networks, using efficient indexing, parallel algorithms, and cost-based optimization, demonstrating strong scalability and performance.
Contribution
It presents a novel RE query processing framework with sublinear index size, parallel algorithms for distributed graphs, and cost-based optimization strategies for large-scale graph querying.
Findings
The approach scales well to billion-node graphs.
The index size is sublinear relative to graph size.
Experimental results confirm high efficiency and scalability.
Abstract
In many applications, it is necessary to retrieve pairs of vertices with the path between them satisfying certain constraints, since regular expression is a powerful tool to describe patterns of a sequence. To meet such requirements, in this paper, we define regular expression (RE) query on graphs to use regular expression to represent the constraints between vertices. To process RE queries on large graphs such as social networks, we propose the RE query processing method with the index size sublinear to the graph size. Considering that large graphs may be randomly distributed in multiple machines, the parallel RE processing algorithms are presented without the assumption of graph distribution. To achieve high efficiency for complex RE query processing, we develop cost-based query optimization strategies with only a small size statistical information which is suitable for querying large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Algorithms and Data Compression · Network Packet Processing and Optimization
