Sublinear Time Algorithms for Earth Mover's Distance
Khanh Do Ba, Huy L Nguyen, Huy N Nguyen, Ronitt Rubinfeld

TL;DR
This paper introduces sublinear time algorithms for estimating Earth Mover's Distance (EMD) between probability distributions using samples, with complexities independent of domain size and applicable to continuous and high-dimensional data.
Contribution
It presents new sample-efficient algorithms for EMD estimation, including closeness testing and additive-error estimation, with optimal lower bounds and specialized algorithms for clusterable data and tree metrics.
Findings
Sample complexities independent of domain size.
Optimal algorithms for EMD over tree metrics.
Efficient testing for highly clusterable data.
Abstract
We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in , with sample complexities independent of domain size - permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other parameters, such as the diameter of the domain space, which may be significantly smaller. We also prove lower bounds showing the dependencies on these parameters to be essentially optimal. Additionally, we consider whether natural classes of distributions exist for which there are algorithms with better dependence on the dimension, and show that for highly clusterable data, this is indeed the case. Lastly, we consider a variant of the EMD, defined over tree metrics instead of the usual L1…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Machine Learning and Data Classification
