Optimal Transport with Heterogeneously Missing Data
Linus Bleistein, Aur\'elien Bellet, Julie Josse

TL;DR
This paper addresses optimal transport problems with heterogeneously missing data, proposing debiasing techniques for Wasserstein distances, efficient estimation methods, and a novel hyperparameter selection strategy, validated through extensive experiments.
Contribution
It introduces debiasing of Wasserstein distances under heterogenous missingness, efficient ISVT-based estimation, and a validation set-free hyperparameter selection method for optimal transport.
Findings
Debiased Wasserstein distance estimation is effective with heterogeneously missing data.
ISVT provides consistent and efficient estimation of entropic regularized optimal transport.
The proposed hyperparameter selection strategy performs well without validation sets.
Abstract
We consider the problem of solving the optimal transport problem between two empirical distributions with missing values. Our main assumption is that the data is missing completely at random (MCAR), but we allow for heterogeneous missingness probabilities across features and across the two distributions. As a first contribution, we show that the Wasserstein distance between empirical Gaussian distributions and linear Monge maps between arbitrary distributions can be debiased without significantly affecting the sample complexity. Secondly, we show that entropic regularized optimal transport can be estimated efficiently and consistently using iterative singular value thresholding (ISVT). We propose a validation set-free hyperparameter selection strategy for ISVT that leverages our estimator of the Bures-Wasserstein distance, which could be of independent interest in general matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Mathematical Approximation and Integration · Mathematical functions and polynomials
