Magnitude Distance: A Geometric Measure of Dataset Similarity

Sahel Torkamani; Henry Gouk; Rik Sarkar

arXiv:2602.08859·cs.LG·February 10, 2026

Magnitude Distance: A Geometric Measure of Dataset Similarity

Sahel Torkamani, Henry Gouk, Rik Sarkar

PDF

Open Access

TL;DR

This paper introduces magnitude distance, a new geometric metric for dataset similarity that adapts to different scales and remains effective in high-dimensional spaces, with applications in training generative models.

Contribution

The paper proposes magnitude distance, a novel dataset distance metric based on the magnitude of metric spaces, with theoretical properties and practical use in generative model training.

Findings

01

Magnitude distance is discriminative in high-dimensional settings.

02

It has desirable theoretical properties across scales.

03

It performs comparably to established distances in generative tasks.

Abstract

Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite datasets using the notion of the \emph{magnitude} of a metric space. The proposed distance incorporates a tunable scaling parameter, $t$ , that controls the sensitivity to global structure (small $t$ ) and finer details (large $t$ ). We prove several theoretical properties of magnitude distance, including its limiting behavior across scales and conditions under which it satisfies key metric properties. In contrast to classical distances, we show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned. We further demonstrate how magnitude distance can be used as a training objective for push-forward generative models. Our experimental results support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification · Stochastic Gradient Optimization Techniques