Disentangled Representation Learning with the Gromov-Monge Gap

Th\'eo Uscidda; Luca Eyring; Karsten Roth; Fabian Theis; Zeynep Akata; Marco Cuturi

arXiv:2407.07829·cs.LG·August 20, 2025

Disentangled Representation Learning with the Gromov-Monge Gap

Th\'eo Uscidda, Luca Eyring, Karsten Roth, Fabian Theis, Zeynep Akata, Marco Cuturi

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel method for disentangled representation learning using Gromov-Monge maps and quadratic optimal transport to preserve geometric features, improving disentanglement performance on standard benchmarks.

Contribution

It proposes the Gromov-Monge-Gap regularizer and a new approach based on quadratic optimal transport for better geometric feature preservation in disentangled representations.

Findings

01

Outperforms existing geometric-based methods on four benchmarks

02

Effectively preserves geometric features like distances and angles

03

Demonstrates improved disentanglement quality

Abstract

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior while preserving geometric features is challenging, as a mapping that fully preserves these features while aligning the data distribution with the prior does not exist in general. To address these challenges, we introduce a novel approach to disentangled representation learning based on quadratic…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- The paper is well-organized and clearly written. - Introducing the Gromov-Monge Gap as a regularizer provides a fresh approach to minimizing geometric feature distortion, presenting a promising advancement for unsupervised disentangled representation learning. - The authors offer valuable insights, particularly in showing that preserving angles through cosine similarity can enhance disentanglement more effectively than preserving distances. - The proof demonstrating the GMG as a weakly convex

Weaknesses

- The paper provides limited discussion on the scalability of GMG for high-dimensional real world datasets. - There is an absence of ablation studies on key hyperparameters, such as entropic regularization.

Reviewer 02Rating 6Confidence 2

Strengths

1. This paper provides a fresh perspective on combining geometric constraints with prior matching in disentanglement learning. Different from previous works that use direct penalties for geometric preservation, the authors propose using Gromov-Monge maps to find mappings that optimally preserve geometric features while aligning distributions. The theoretical analysis proves weak convexity of the GMG regularizer and provides precise characterization of convexity constants. 2. The authors have co

Weaknesses

1. It remains unclear to me why the Gromov-Monge map framework is the optimal choice for combining geometric preservation with distribution alignment. While the authors show empirically that GMG outperforms direct distortion penalties, they don't fully explain why measuring suboptimality in the Gromov-Monge problem provides a better learning signal than other potential optimal transport formulations or geometric metrics. 2. My another concern is about the practicality of solving Gromov-Wasserst

Reviewer 03Rating 5Confidence 3

Strengths

1. The intuition of this idea is good, it tries to minimize the difference between the previous Distortion and Gromov-Wasserstein distance so that the mapping between source and target should preserve the full features with minimal distortion. 2. They show that the GMG and its counterpart are weakly convex functions. 3. The experiments in 3D Shapes dataset and the clustering illustration sound persuasive.

Weaknesses

1. The computation of Gromov-Wasserstein distance is very expensive even when entropy regularizer is employed, the time complexity is still around O(N^5). This makes the GMG not practical when involving the method in a large-scale data setting. 2. Limited experiments: the datasets in the experiments are all about 3D shapes. How about 2D shapes? I would like to see how it can apply to the MNIST, another simple 2D digit. 3. I believe that the robustness should also be important part but this wor

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Analysis · Topological and Geometric Data Analysis · Image and Object Detection Techniques