Lightspeed Geometric Dataset Distance via Sliced Optimal Transport
Khai Nguyen, Hai Nguyen, Tuan Pham, Nhat Ho

TL;DR
This paper introduces s-OTDD, a fast, model- and embedding-agnostic dataset distance measure based on sliced optimal transport, which effectively compares datasets regardless of class labels or disjoint label sets.
Contribution
The paper proposes the novel s-OTDD method utilizing Moment Transform Projection for efficient dataset comparison without training or class label constraints.
Findings
s-OTDD correlates with optimal transport dataset distance
It is computationally efficient with near-linear complexity
It effectively predicts transfer learning and classification performance gaps
Abstract
We introduce sliced optimal transport dataset distance (s-OTDD), a model-agnostic, embedding-agnostic approach for dataset comparison that requires no training, is robust to variations in the number of classes, and can handle disjoint label sets. The core innovation is Moment Transform Projection (MTP), which maps a label, represented as a distribution over features, to a real number. Using MTP, we derive a data point projection that transforms datasets into one-dimensional distributions. The s-OTDD is defined as the expected Wasserstein distance between the projected distributions, with respect to random projection parameters. Leveraging the closed form solution of one-dimensional optimal transport, s-OTDD achieves (near-)linear computational complexity in the number of data points and feature dimensions and is independent of the number of classes. With its geometrically meaningful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Object Detection Techniques · Neural Networks and Applications · Video Surveillance and Tracking Methods
