On the optimality of universal classifiers for finite-length individual test sequences
Jacob Ziv

TL;DR
This paper analyzes the limits of universal classifiers for finite-length sequences from stationary sources, showing that variable-length classifiers outperform fixed-length ones beyond a certain sequence length.
Contribution
It introduces and compares fixed-length and variable-length universal classifiers, demonstrating the superiority of variable-length classifiers for finite sequences.
Findings
Universal classifiers fail with high probability when sequence length is below a certain threshold.
For large sequence lengths, classification error tends to zero for both classifiers.
Variable-length classifiers have uniformly smaller error than fixed-length classifiers for any finite sequence length.
Abstract
We consider pairs of finite-length individual sequences that are realizations of unknown, finite alphabet, stationary sources in a clas M of sources with vanishing memory (e.g. stationary Markov sources). The task of a universal classifier is to decide whether the two sequences are emerging from the same source or are emerging from two distinct sources in M, and it has to carry this task without any prior knowledge of the two underlying probability measures. Given a fidelity function and a fidelity criterion, the probability of classification error for a given universal classifier is defined. Two universal classifiers are defined for pairs of -sequence: A "classical" fixed-length (FL) universal classifier and an alternative variable-length (VL) universal classifier. Following Wyner and Ziv (1996) it is demonstrated that if the length of the individual sequences N is smaller…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
