A Family of LZ78-based Universal Sequential Probability Assignments

Naomi Sagan; Tsachy Weissman

arXiv:2410.06589·cs.IT·December 15, 2025

A Family of LZ78-based Universal Sequential Probability Assignments

Naomi Sagan, Tsachy Weissman

PDF

Open Access

TL;DR

This paper introduces a family of universal probability models based on LZ78 compression, demonstrating their convergence properties, theoretical foundations, and practical benefits for compression, generation, and classification tasks.

Contribution

The paper develops a new family of universal sequential probability assignments derived from LZ78, with proven convergence and broad applicability.

Findings

01

Normalized log loss converges to LZ78 codelength uniformly.

02

Models are effective for compression, generation, and classification.

03

Theoretical and computational properties are analyzed.

Abstract

We propose and study a family of universal sequential probability assignments on individual sequences, based on the incremental parsing procedure of the Lempel-Ziv (LZ78) compression algorithm. We show that the normalized log loss under any of these models converges to the normalized LZ78 codelength, uniformly over all individual sequences. To establish the universality of these models, we consolidate a set of results from the literature relating finite-state compressibility to optimal log-loss under Markovian and finite-state models. We also consider some theoretical and computational properties of these models when viewed as probabilistic sources. Finally, we present experimental results showcasing the potential benefit of using this family -- as models and as sources -- for compression, generation, and classification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Data Management and Algorithms