Large Deviations for Sequential Tests of Statistical Sequence Matching
Lin Zhou, Qianyun Wang, Yun Wei, Jingjing Wang

TL;DR
This paper develops theoretical guarantees for sequential tests in statistical sequence matching, demonstrating their advantages over fixed-length tests and extending results to unknown match scenarios with multiple error tradeoffs.
Contribution
It derives the exponential decay rate of mismatch probability for optimal sequential tests and generalizes to unknown match counts, revealing benefits over fixed-length methods.
Findings
Sequential tests have larger mismatch exponents than fixed-length tests.
Proposed sequential test achieves bounded expected stopping time under certain conditions.
Tradeoff among error decay rates is characterized for unknown match scenarios.
Abstract
We revisit the problem of statistical sequence matching initiated by Unnikrishnan (TIT 2015) and derive theoretical performance guarantees for sequential tests that have bounded expected stopping times. Specifically, in this problem, one is given two databases of sequences and the task is to identify all matched pairs of sequences. In each database, each sequence is generated i.i.d. from a distinct distribution and a pair of sequences is said matched if they are generated from the same distribution. The generating distribution of each sequence is \emph{unknown}. We first consider the case where the number of matches is known and derive the exact exponential decay rate of the mismatch (error) probability, a.k.a. the mismatch exponent, under each hypothesis for optimal sequential tests. Our results reveal the benefit of sequentiality by showing that optimal sequential tests have larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Advanced Statistical Process Monitoring · Data Quality and Management
