Matching Observations to Distributions: Efficient Estimation via Sparsified Hungarian Algorithm
Sinho Chewi, Forest Yang, Avishek Ghosh, Abhay Parekh, Kannan, Ramchandran

TL;DR
This paper presents a fast, randomized algorithm for matching observations to known distributions using a sparsified Hungarian method, with theoretical guarantees and statistical analysis showing the MLE's effectiveness in high-dimensional settings.
Contribution
It introduces a novel sparsified Hungarian algorithm that reduces runtime for maximum likelihood estimation in distribution matching problems, with proven statistical bounds.
Findings
The new algorithm achieves $ ilde{O}(n^2)$ runtime, improving over the traditional $ ext{O}(n^3)$.
Separation of $ ext{log }k$ between Gaussian means suffices for perfect matching.
Expected mismatch rate decreases at rate $ ext{O}(( ext{log }k)^2/ ext{distance}^2)$.
Abstract
Suppose we are given observations, where each observation is drawn independently from one of known distributions. The goal is to match each observation to the distribution from which it was drawn. We observe that the maximum likelihood estimator (MLE) for this problem can be computed using weighted bipartite matching, even when , the number of observations per distribution, exceeds one. This is achieved by instantiating duplicates of each distribution node. However, in the regime where the number of observations per distribution is much larger than the number of distributions, the Hungarian matching algorithm for computing the weighted bipartite matching requires time. We introduce a novel randomized matching algorithm that reduces the runtime to by sparsifying the original graph, returning the exact MLE with high probability. Next,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
