Indexing the Earth Mover's Distance Using Normal Distributions
Brian E. Ruttenberg, Ambuj K. Singh

TL;DR
This paper introduces a novel indexing method using normal distribution approximations and Hough space transformations to accelerate Earth Mover's Distance based K-NN queries on uncertain data sets.
Contribution
It proposes a new lower bound and an index structure that significantly improves EMD query performance for uncertain databases.
Findings
Reduces K-NN query time on uncertain data
Scales well with large and complex datasets
Effective for heterogeneous uncertain data sets
Abstract
Querying uncertain data sets (represented as probability distributions) presents many challenges due to the large amount of data involved and the difficulties comparing uncertainty between distributions. The Earth Mover's Distance (EMD) has increasingly been employed to compare uncertain data due to its ability to effectively capture the differences between two distributions. Computing the EMD entails finding a solution to the transportation problem, which is computationally intensive. In this paper, we propose a new lower bound to the EMD and an index structure to significantly improve the performance of EMD based K-nearest neighbor (K-NN) queries on uncertain databases. We propose a new lower bound to the EMD that approximates the EMD on a projection vector. Each distribution is projected onto a vector and approximated by a normal distribution, as well as an accompanying error term.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Transportation Planning and Optimization · Advanced Database Systems and Queries
