On the Wasserstein Geodesic Principal Component Analysis of probability measures

Nina Vesseron; Elsa Cazelles; Alice Le Brigant; Thierry Klein

arXiv:2506.04480·stat.ML·June 6, 2025

On the Wasserstein Geodesic Principal Component Analysis of probability measures

Nina Vesseron, Elsa Cazelles, Alice Le Brigant, Thierry Klein

PDF

Open Access 3 Reviews

TL;DR

This paper develops a novel Wasserstein geometry-based PCA method for probability measures, using neural networks for geodesic parameterization, and compares it with classical tangent PCA on real datasets.

Contribution

Introduces a Wasserstein geodesic PCA framework with neural network parameterization for general probability measures, extending Gaussian case analysis.

Findings

01

Effective geodesic PCA on probability measures demonstrated

02

Neural network approach successfully parameterizes Wasserstein geodesics

03

Comparison shows advantages over classical tangent PCA

Abstract

This paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of the underlying dataset. We first address the case of a collection of Gaussian distributions, and show how to lift the computations in the space of invertible linear maps. For the more general setting of absolutely continuous probability measures, we leverage a novel approach to parameterizing geodesics in Wasserstein space with neural networks. Finally, we compare to classical tangent PCA through various examples and provide illustrations on real-world datasets.

Peer Reviews

Decision·ICLR 2026 Oral

Reviewer 01Rating 10Confidence 3

Strengths

The work shows a method to compute principal modes of variation in datasets of probability measures, specifically using the Wasserstein geometry. For Gaussian measures, the method leverages the Bures-Wasserstein geometry and lifts computations to the space of invertible matrices, providing exact geodesics as principal components. This is a significant contribution over earlier methods which have used linearized Wasserstein distances (Wang et al. (2013) and Boissard et al. (2015)), have approxim

Weaknesses

The block alternating algorithm for Gaussian GPCA is not guaranteed to always converge to a unique minimum due to non-uniqueness in the problem geometry (the authors acknowledge this). In the general case, one needs to verify the eigenvalues of the Hessian at each step during the Otto geodesic update. This may be computationally expensive. While the neural network implementation facilitates computational tractability, the construction of geodesics needs further tuning and learning from large

Reviewer 02Rating 6Confidence 4

Strengths

I liked the following: * **Interesting problem.** Generalizing PCA to spaces of probability measures seems to be a generically useful tool, since comparing distributions is a central task throughout machine learning which recurs in many situations. * **Technically sound - especially for Gaussian distributions.** The approach involves the Bures-Wasserstein geometry and relationships between certain matrix groups, and makes it easier to see * **Easily provides use cases beyond what the authors hav

Weaknesses

I am worried about the following: * **Use of regularization in neural network objectives.** In particular, using regularization to enforce geometric constraints is much weaker than incorporating them as a hard constraint via a clever parametrization. In practice, I suspect the different directions do not end up orthogonal, and it would be helpful to quantify how much this is a problem in practice, and how sensitive it is to hyperparameter tuning. I did not see an experiment directly addressing t

Reviewer 03Rating 8Confidence 4

Strengths

**Exposition:** I found the paper to be very well written. It has a common thread running through it that makes it easy to follow the story. Thus, I could read it in one go and understood all the core ideas. Furthermore, I think all the necessary information is included in the paper needed to reproduce the method and the experiments. The division of information between main text and appendix is also sensible. **Novelty:** I think the introduced method is novel and advances the state-of-the-art

Weaknesses

**Scalability:** I am a bit worried about the scalability of the method. All examples are conducted on a small scale with at most two components. Thus, the paper leaves the gap what would happen for larger datasets and what kind of resources the method requires in such a scenario. It would be great if the authors could discuss this in the paper and also illuminate if it is, indeed, a problem. **Practical applications:** This ties into the second weakness I see with the paper: a lack of practica

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMorphological variations and asymmetry

MethodsPrincipal Components Analysis