An Efficient Approach for Discovering Graph Entity Dependencies (GEDs)
Dehua Liu, Selasi Kwashie, Yidi Zhang, Guangtong Zhou, Michael Bewong,, Xiaoying Wu, Xi Guo, Keqing He, Zaiwen Feng

TL;DR
This paper introduces a novel, efficient method for discovering graph entity dependencies (GEDs) in property graphs, which are useful for data quality and management tasks, by formalizing the problem and proposing scalable solutions.
Contribution
It formalizes the GED discovery problem, proposes an effective approach leveraging graph partitioning and pruning, and introduces an interestingness measure based on minimum description length.
Findings
The approach is scalable on real-world graph datasets.
GEDs discovered improve data quality tasks.
The method outperforms existing techniques in efficiency.
Abstract
Graph entity dependencies (GEDs) are novel graph constraints, unifying keys and functional dependencies, for property graphs. They have been found useful in many real-world data quality and data management tasks, including fact checking on social media networks and entity resolution. In this paper, we study the discovery problem of GEDs -- finding a minimal cover of valid GEDs in a given graph data. We formalise the problem, and propose an effective and efficient approach to overcome major bottlenecks in GED discovery. In particular, we leverage existing graph partitioning algorithms to enable fast GED-scope discovery, and employ effective pruning strategies over the prohibitively large space of candidate dependencies. Furthermore, we define an interestingness measure for GEDs based on the minimum description length principle, to score and rank the mined cover set of GEDs. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Graph Neural Networks · Semantic Web and Ontologies
