Efficient Discovery of Ontology Functional Dependencies
Sridevi Baskaran, Alexander Keller, Fei Chiang, Golab Lukasz, Jaroslaw, Szlichta

TL;DR
This paper introduces Ontology Functional Dependencies (OFDs), extending traditional data constraints with ontology-based relationships, and provides theoretical foundations, discovery algorithms, and experimental validation for improved data quality management.
Contribution
It defines and formalizes OFDs, develops sound and complete axioms, and presents scalable algorithms for their discovery, enhancing data cleaning with ontology-aware constraints.
Findings
Algorithms efficiently discover OFDs in large datasets.
OFDs improve data cleaning accuracy by capturing broader relationships.
Experimental results demonstrate scalability and effectiveness of the proposed methods.
Abstract
Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Data Mining Algorithms and Applications
