
TL;DR
This paper introduces an algebraic framework for data integration that leverages category theory and functional programming, providing a formal query language and a tool implementation for seamless schema and data management.
Contribution
It develops a novel algebraic formalism for data integration using category theory, including a query language and a pushout-based design pattern.
Findings
Formalizes schemas and instances as algebraic theories and categories
Defines data migration functors with adjoint relationships for data transfer
Implements the formalism in a tool called CQL
Abstract
In this paper we develop an algebraic approach to data integration by combining techniques from functional programming, category theory, and database theory. In our formalism, database schemas and instances are algebraic (multi-sorted equational) theories of a certain form. Schemas denote categories, and instances denote their initial (term) algebras. The instances on a schema S form a category, S-Inst, and a morphism of schemas F : S -> T induces three adjoint data migration functors: Sigma_F : S-Inst -> T-Inst, defined by substitution along F, which has a right adjoint Delta_F : T-Inst -> S-Inst, which in turn has a right adjoint Pi_F : S-Inst -> T-Inst. We present a query language based on for/where/return syntax where each query denotes a sequence of data migration functors; a pushout-based design pattern for performing data integration using our formalism; and describe the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
