Data, geometry and homology

Jens Agerberg; Wojciech Chacholski; Ryan Ramanujam

arXiv:2203.08306·math.AT·March 17, 2022·1 cites

Data, geometry and homology

Jens Agerberg, Wojciech Chacholski, Ryan Ramanujam

PDF

Open Access

TL;DR

This paper explores how homology-based invariants can characterize dataset geometry and how subsampling affects this geometry, enabling classification of points from different data distributions.

Contribution

It introduces a framework for analyzing how dataset geometry changes under various subsampling methods using homology-based invariants.

Findings

01

Subsampling alters dataset geometry in measurable ways.

02

Homology invariants can distinguish different data distributions.

03

The method enables classification based on geometric properties.

Abstract

Homology-based invariants can be used to characterize the geometry of datasets and thereby gain some understanding of the processes generating those datasets. In this work we investigate how the geometry of a dataset changes when it is subsampled in various ways. In our framework the dataset serves as a reference object; we then consider different points in the ambient space and endow them with a geometry defined in relation to the reference object, for instance by subsampling the dataset proportionally to the distance between its elements and the point under consideration. We illustrate how this process can be used to extract rich geometrical information, allowing for example to classify points coming from different data distributions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological and Geometric Data Analysis · Data Visualization and Analytics · Image Retrieval and Classification Techniques