Fast unsupervised ground metric learning with tree-Wasserstein distance

Kira M. D\"usterwald; Samo Hromadka; Makoto Yamada

arXiv:2411.07432·cs.LG·January 13, 2025

Fast unsupervised ground metric learning with tree-Wasserstein distance

Kira M. D\"usterwald, Samo Hromadka, Makoto Yamada

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a fast, scalable method for unsupervised ground metric learning using tree-Wasserstein distance, improving computational efficiency while maintaining approximation quality, demonstrated on genomics data for cell clustering.

Contribution

Proposes a novel tree-embedded Wasserstein singular vector method that reduces computational complexity and retains approximation quality for unsupervised ground metric learning.

Findings

01

Algorithm converges to better approximation than existing methods.

02

Achieves $ ext{O}(n^3 + m^3 + mn)$ complexity, faster than previous approaches.

03

Demonstrates scalability and utility on single-cell RNA sequencing datasets.

Abstract

The performance of unsupervised methods such as clustering depends on the choice of distance metric between features, or ground metric. Commonly, ground metrics are decided with heuristics or learned via supervised algorithms. However, since many interesting datasets are unlabelled, unsupervised ground metric learning approaches have been introduced. One promising option employs Wasserstein singular vectors (WSVs), which emerge when computing optimal transport distances between features and samples simultaneously. WSVs are effective, but can be prohibitively computationally expensive in some applications: $O (n^{2} m^{2} (n lo g (n) + m lo g (m))$ for $n$ samples and $m$ features. In this work, we propose to augment the WSV method by embedding samples and features on trees, on which we compute the tree-Wasserstein distance (TWD). We demonstrate theoretically and empirically that the…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The proposed method demonstrates efficiency compared to WSV and SSV presented in Huizing et al. (2022).

Weaknesses

- The authors missed important related works [1, 2, 3, 4, 5] that consider the relationships between the samples are informed by the relationships between the columns, and vice versa, for Wasserstein distance and specifically in tree-related settings [2,5]. Specifically, the setup of randomly permuted dataset rows and columns in the toy datasets was one of the important tasks in these works. - The explanation of how the Wasserstein distance can serve as a tree distance in Proposition 2.1 is uncl

Reviewer 02Rating 8Confidence 3

Strengths

The idea is sound, and there is a significant element of originality, especially in the development of the algorithm in Appendix C.

Weaknesses

The paper seems rushed overall, there are a large number of typos and a number of results that should have been presented as Theorems are merely stated informally. 1. l.238 – 241 these statements require a proof. Especially the convergence, since it was already somewhat delicate in Huizing et al. (2022). 2. l.263 “Wasserstein” -> “Tree Wasserstein” or else requires a proof. 3. I find Theorem 2.2 hard to interpret. Can the authors rephrase the interpretation in the next paragraph (l. 256-l.260)

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper offers rigorous theoretical support for the TWD method, with proofs on the uniqueness and existence of solutions within specific tree configurations. 2. The empirical results are presented clearly, with comparative metrics that directly illustrate the computational runtime saving and clustering performance. 3. The paper provides a solid background review on optimal transport theory and the tree-Wasserstein distance.

Weaknesses

1. Although the ClusterTree algorithm plays a significant role in the tree structure initialization, there is limited background provided on how it operates, what assumptions it makes, or its typical applications. I reviewed both references—Le et al. (2019) and Indyk & Thaper (2003)—but did not find any mention of a ClusterTree. Could the author be referring to the ‘Partition_Tree_Metric’ described in Le et al. (2019)? 2. The algorithm section mentions differences in handling ‘large’ and ‘small’

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Human Pose and Action Recognition · Anomaly Detection Techniques and Applications