Topology-Preserving Scaling in Data Augmentation

Vu-Anh Le; Mehmet Dik

arXiv:2411.19512·math.AT·June 23, 2025

Topology-Preserving Scaling in Data Augmentation

Vu-Anh Le, Mehmet Dik

PDF

Open Access

TL;DR

This paper introduces a mathematically grounded method for dataset normalization in data augmentation that preserves topological features under non-uniform scaling, ensuring the stability of persistent homology during transformations.

Contribution

It presents a theoretical framework and an algorithmic approach to minimize topological distortion in scaled datasets, extending to higher-dimensional features and alternative metrics.

Findings

01

Bound on bottleneck distance proportional to scaling variability

02

Algorithm for optimal scaling under topological constraints

03

Extension to Wasserstein distance and higher-dimensional features

Abstract

We propose an algorithmic framework for dataset normalization in data augmentation pipelines that preserves topological stability under non-uniform scaling transformations. Given a finite metric space \( X \subset \mathbb{R}^n \) with Euclidean distance \( d_X \), we consider scaling transformations defined by scaling factors \( s_1, s_2, \ldots, s_n > 0 \). Specifically, we define a scaling function \( S \) that maps each point \( x = (x_1, x_2, \ldots, x_n) \in X \) to \[ S(x) = (s_1 x_1, s_2 x_2, \ldots, s_n x_n). \] Our main result establishes that the bottleneck distance \( d_B(D, D_S) \) between the persistence diagrams \( D \) of \( X \) and \( D_S \) of \( S(X) \) satisfies: \[ d_B(D, D_S) \leq (s_{\max} - s_{\min}) \cdot \operatorname{diam}(X), \] where \( s_{\min} = \min_{1 \leq i \leq n} s_i \), \( s_{\max} = \max_{1 \leq i \leq n} s_i \), and \( \operatorname{diam}(X) \) is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Topological and Geometric Data Analysis