TL;DR
This paper evaluates various variable order Markov models for sequence prediction, comparing their performance on real-world data from proteins, text, and music, and introduces insights into their relative effectiveness.
Contribution
It provides a comparative analysis of six prominent prediction algorithms, highlighting the superior performance of certain models like CTW and PPM, and introduces a modified Lempel-Ziv algorithm for protein classification.
Findings
Decomposed CTW and PPM outperform other models in sequence prediction.
A modified Lempel-Ziv algorithm significantly outperforms others in protein classification.
Performance varies across domains, with some models excelling in specific tasks.
Abstract
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a "decomposed" CTW (a variant of the CTW algorithm) and PPM outperform all other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
