Parallel Streaming Wasserstein Barycenters
Matthew Staib, Sebastian Claici, Justin Solomon, Stefanie Jegelka

TL;DR
This paper introduces a scalable, parallel algorithm for computing Wasserstein barycenters that efficiently handles streaming, continuous, and nonstationary data, with theoretical bounds and practical applications.
Contribution
It presents the first bounds on barycenter quality with discretization and offers a method optimized for streaming, continuous, and distributed data.
Findings
Effective in tracking moving distributions on a sphere
Demonstrates scalability in large-scale Bayesian inference
Robust to nonstationary input distributions
Abstract
Efficiently aggregating data from different sources is a challenging problem, particularly when samples from each source are distributed differently. These differences can be inherent to the inference task or present for other reasons: sensors in a sensor network may be placed far apart, affecting their individual measurements. Conversely, it is computationally advantageous to split Bayesian inference tasks across subsets of data, but data need not be identically distributed across subsets. One principled way to fuse probability distributions is via the lens of optimal transport: the Wasserstein barycenter is a single distribution that summarizes a collection of input measures while respecting their geometry. However, computing the barycenter scales poorly and requires discretization of all input distributions and the barycenter itself. Improving on this situation, we present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Anomaly Detection Techniques and Applications · Gaussian Processes and Bayesian Inference
