Data, geometry and homology
Jens Agerberg, Wojciech Chacholski, Ryan Ramanujam

TL;DR
This paper explores how homology-based invariants can characterize dataset geometry and how subsampling affects this geometry, enabling classification of points from different data distributions.
Contribution
It introduces a framework for analyzing how dataset geometry changes under various subsampling methods using homology-based invariants.
Findings
Subsampling alters dataset geometry in measurable ways.
Homology invariants can distinguish different data distributions.
The method enables classification based on geometric properties.
Abstract
Homology-based invariants can be used to characterize the geometry of datasets and thereby gain some understanding of the processes generating those datasets. In this work we investigate how the geometry of a dataset changes when it is subsampled in various ways. In our framework the dataset serves as a reference object; we then consider different points in the ambient space and endow them with a geometry defined in relation to the reference object, for instance by subsampling the dataset proportionally to the distance between its elements and the point under consideration. We illustrate how this process can be used to extract rich geometrical information, allowing for example to classify points coming from different data distributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Data Visualization and Analytics · Image Retrieval and Classification Techniques
