Kd-tree Based Wasserstein Distance Approximation for High-Dimensional Data
Kanata Teshigawara, Keisho Oh, Ken Kobayashi, Kazuhide Nakata

TL;DR
This paper introduces kd-Flowtree, a kd-tree-based method for approximating Wasserstein distance in high-dimensional data, achieving better accuracy and faster computation than existing methods for large-scale retrieval tasks.
Contribution
The paper presents kd-Flowtree, a novel kd-tree-based approximation method that maintains high accuracy and reduces preprocessing time in high-dimensional Wasserstein distance computations.
Findings
kd-Flowtree outperforms existing methods in retrieval accuracy.
It provides a dataset-size-independent probabilistic accuracy bound.
kd-Flowtree reduces preprocessing and computation time.
Abstract
The Wasserstein distance is a discrepancy measure between probability distributions, defined by an optimal transport problem. It has been used for various tasks such as retrieving similar items in high-dimensional images or text data. In retrieval applications, however, the Wasserstein distance is calculated repeatedly, and its cubic time complexity with respect to input size renders it unsuitable for large-scale datasets. Recently, tree-based approximation methods have been proposed to address this bottleneck. For example, the Flowtree algorithm computes transport on a quadtree and evaluates cost using the ground metric, and clustering-tree approaches have been reported to achieve high accuracy. However, these existing trees often incur significant construction time for preprocessing, and crucially, standard quadtrees cannot grow deep enough in high-dimensional spaces, resulting in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Geometric Analysis and Curvature Flows · Image Processing and 3D Reconstruction
