Matching Observations to Distributions: Efficient Estimation via   Sparsified Hungarian Algorithm

Sinho Chewi; Forest Yang; Avishek Ghosh; Abhay Parekh; Kannan; Ramchandran

arXiv:1806.06766·cs.DS·October 1, 2019

Matching Observations to Distributions: Efficient Estimation via Sparsified Hungarian Algorithm

Sinho Chewi, Forest Yang, Avishek Ghosh, Abhay Parekh, Kannan, Ramchandran

PDF

TL;DR

This paper presents a fast, randomized algorithm for matching observations to known distributions using a sparsified Hungarian method, with theoretical guarantees and statistical analysis showing the MLE's effectiveness in high-dimensional settings.

Contribution

It introduces a novel sparsified Hungarian algorithm that reduces runtime for maximum likelihood estimation in distribution matching problems, with proven statistical bounds.

Findings

01

The new algorithm achieves $ ilde{O}(n^2)$ runtime, improving over the traditional $ ext{O}(n^3)$.

02

Separation of $ ext{log }k$ between Gaussian means suffices for perfect matching.

03

Expected mismatch rate decreases at rate $ ext{O}(( ext{log }k)^2/ ext{distance}^2)$.

Abstract

Suppose we are given observations, where each observation is drawn independently from one of $k$ known distributions. The goal is to match each observation to the distribution from which it was drawn. We observe that the maximum likelihood estimator (MLE) for this problem can be computed using weighted bipartite matching, even when $n$ , the number of observations per distribution, exceeds one. This is achieved by instantiating $n$ duplicates of each distribution node. However, in the regime where the number of observations per distribution is much larger than the number of distributions, the Hungarian matching algorithm for computing the weighted bipartite matching requires $O (n^{3})$ time. We introduce a novel randomized matching algorithm that reduces the runtime to $\tilde{O} (n^{2})$ by sparsifying the original graph, returning the exact MLE with high probability. Next,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.