Persistent Obstruction Theory for a Model Category of Measures with Applications to Data Merging
Abraham D. Smith, Paul Bendich, John Harer

TL;DR
This paper develops a homotopical obstruction theory for measures on compact metric spaces, providing a mathematical framework to understand and detect topological obstructions in data merging and database join operations.
Contribution
It introduces a model category of measures with a homotopy and homology theory that captures obstructions to constructing measures on larger product spaces, with applications to data science.
Findings
Obstructions to data merging can be detected by cocycles.
The theory quantifies the difficulty of database joins.
Persistent obstructions relate to Wasserstein distance filtration.
Abstract
Collections of measures on compact metric spaces form a model category ("data complexes"), whose morphisms are marginalization integrals. The fibrant objects in this category represent collections of measures in which there is a measure on a product space that marginalizes to any measures on pairs of its factors. The homotopy and homology for this category allow measurement of obstructions to finding measures on larger and larger product spaces. The obstruction theory is compatible with a fibrant filtration built from the Wasserstein distance on measures. Despite the abstract tools, this is motivated by a widespread problem in data science. Data complexes provide a mathematical foundation for semi-automated data-alignment tools that are common in commercial database software. Practically speaking, the theory shows that database JOIN operations are subject to genuine topological…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
