Relational Database Data Lineage Ontology
Jakub Dutkiewicz, Pawe{\l} Misiorek, Robert Wrembel

TL;DR
This paper introduces an advanced ontology for relational database data lineage that enhances semantic representation and improves lineage link prediction using knowledge graphs and graph neural networks.
Contribution
The paper extends previous KG-based lineage models with new concepts for structural, semantic, and transformation details, enabling more accurate lineage discovery.
Findings
Enriched ontology improves lineage link prediction performance.
Graph neural network models perform better with the new ontology.
Experimental results show higher AUC and Hits@10 metrics.
Abstract
Modeling data lineage in relational databases remains a challenging problem, particularly in scenarios involving incomplete or missing dependencies between database objects. In this paper, we propose a novel ontology for relational database data lineage, designed to provide a richer and more expressive semantic representation supporting discovering the lineage links by means of knowledge graphs (KGs). Building upon our previous work on KG-based lineage discovery, the proposed ontology extends the earlier model with additional concepts capturing structural, semantic, and transformation-level characteristics of relational data. These extensions enable more precise encoding of lineage evidence. To evaluate the impact of the proposed ontology, we conduct a comparative study using a KG-based inductive link prediction framework. Specifically, we assess the performance of a graph neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
