On the Wasserstein Geodesic Principal Component Analysis of probability measures
Nina Vesseron, Elsa Cazelles, Alice Le Brigant, Thierry Klein

TL;DR
This paper develops a novel Wasserstein geometry-based PCA method for probability measures, using neural networks for geodesic parameterization, and compares it with classical tangent PCA on real datasets.
Contribution
Introduces a Wasserstein geodesic PCA framework with neural network parameterization for general probability measures, extending Gaussian case analysis.
Findings
Effective geodesic PCA on probability measures demonstrated
Neural network approach successfully parameterizes Wasserstein geodesics
Comparison shows advantages over classical tangent PCA
Abstract
This paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of the underlying dataset. We first address the case of a collection of Gaussian distributions, and show how to lift the computations in the space of invertible linear maps. For the more general setting of absolutely continuous probability measures, we leverage a novel approach to parameterizing geodesics in Wasserstein space with neural networks. Finally, we compare to classical tangent PCA through various examples and provide illustrations on real-world datasets.
Peer Reviews
Decision·ICLR 2026 Oral
The work shows a method to compute principal modes of variation in datasets of probability measures, specifically using the Wasserstein geometry. For Gaussian measures, the method leverages the Bures-Wasserstein geometry and lifts computations to the space of invertible matrices, providing exact geodesics as principal components. This is a significant contribution over earlier methods which have used linearized Wasserstein distances (Wang et al. (2013) and Boissard et al. (2015)), have approxim
The block alternating algorithm for Gaussian GPCA is not guaranteed to always converge to a unique minimum due to non-uniqueness in the problem geometry (the authors acknowledge this). In the general case, one needs to verify the eigenvalues of the Hessian at each step during the Otto geodesic update. This may be computationally expensive. While the neural network implementation facilitates computational tractability, the construction of geodesics needs further tuning and learning from large
I liked the following: * **Interesting problem.** Generalizing PCA to spaces of probability measures seems to be a generically useful tool, since comparing distributions is a central task throughout machine learning which recurs in many situations. * **Technically sound - especially for Gaussian distributions.** The approach involves the Bures-Wasserstein geometry and relationships between certain matrix groups, and makes it easier to see * **Easily provides use cases beyond what the authors hav
I am worried about the following: * **Use of regularization in neural network objectives.** In particular, using regularization to enforce geometric constraints is much weaker than incorporating them as a hard constraint via a clever parametrization. In practice, I suspect the different directions do not end up orthogonal, and it would be helpful to quantify how much this is a problem in practice, and how sensitive it is to hyperparameter tuning. I did not see an experiment directly addressing t
**Exposition:** I found the paper to be very well written. It has a common thread running through it that makes it easy to follow the story. Thus, I could read it in one go and understood all the core ideas. Furthermore, I think all the necessary information is included in the paper needed to reproduce the method and the experiments. The division of information between main text and appendix is also sensible. **Novelty:** I think the introduced method is novel and advances the state-of-the-art
**Scalability:** I am a bit worried about the scalability of the method. All examples are conducted on a small scale with at most two components. Thus, the paper leaves the gap what would happen for larger datasets and what kind of resources the method requires in such a scenario. It would be great if the authors could discuss this in the paper and also illuminate if it is, indeed, a problem. **Practical applications:** This ties into the second weakness I see with the paper: a lack of practica
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMorphological variations and asymmetry
MethodsPrincipal Components Analysis
