Asymptotically Optimal Matching of Multiple Sequences to Source Distributions and Training Sequences
Jayakrishnan Unnikrishnan

TL;DR
This paper develops asymptotically optimal methods for matching multiple observed sequences to their source distributions, even with unknown source laws, ensuring minimal incorrect matches and leveraging the constraint that sequences come from distinct sources.
Contribution
It introduces a sequence of tests using minimum weight matching algorithms that achieve exponential decay of errors and incorporate source constraints, improving performance over unconstrained methods.
Findings
Tests achieve exponential decay of incorrect matching probabilities.
Incorporating source constraints improves error and rejection exponents.
Methods are effective with known and unknown source distributions, using training sequences.
Abstract
Consider a finite set of sources, each producing i.i.d. observations that follow a unique probability distribution on a finite alphabet. We study the problem of matching a finite set of observed sequences to the set of sources under the constraint that the observed sequences are produced by distinct sources. In general, the number of sequences may be different from the number of sources , and only some of the observed sequences may be produced by a source from the set of sources of interest. We consider two versions of the problem -- one in which the probability laws of the sources are known, and another in which the probability laws of the sources are unspecified but one training sequence from each of the sources is available. We show that both these problems can be solved using a sequence of tests that are allowed to produce "no-match" decisions. The tests…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
