PG-HIVE: Hybrid Incremental Schema Discovery for Property Graphs
Sofia Sideri, Georgia Troullinou, Elisjana Ymeralli, Vasilis Efthymiou, Dimitris Plexousakis, Haridimos Kondylakis

TL;DR
PG-HIVE is an innovative framework that automatically discovers schemas in property graphs, identifying node and edge types, properties, and constraints efficiently even as data evolves, thus enhancing graph management and analysis.
Contribution
It introduces a novel incremental schema discovery method combining hashing and clustering, outperforming existing solutions in accuracy and efficiency.
Findings
Outperforms state-of-the-art in schema accuracy by up to 65%.
Achieves up to 1.95x faster execution times.
Effectively handles schema discovery in evolving property graphs.
Abstract
Property graphs have rapidly become the de facto standard for representing and managing complex, interconnected data, powering applications across domains from knowledge graphs to social networks. Despite the advantages, their schema-free nature poses major challenges for integration, exploration, visualization, and efficient querying. To bridge this gap, we present PG-HIVE, a novel framework for automatic schema discovery in property graphs. PG-HIVE goes beyond existing approaches by uncovering latent node and edge types, inferring property datatypes, constraints, and cardinalities, and doing so even in the absence of explicit labeling information. Leveraging a unique combination of Locality-Sensitive Hashing with property- and label-based clustering, PG-HIVE identifies structural similarities at scale. Moreover, it introduces incremental schema discovery, eliminating costly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Advanced Graph Neural Networks · Bioinformatics and Genomic Networks
