Automata and Graph Compression
Mehryar Mohri, Michael Riley, Ananda Theertha Suresh

TL;DR
This paper introduces a theoretical framework for automata and graph compression, proposing a universal scheme that outperforms traditional methods like gzip on various datasets.
Contribution
It develops a probabilistic model for automata and graph generation and introduces the LZA compression scheme, demonstrating its superior performance.
Findings
LZA outperforms gzip and UNIX compress on synthetic data
The framework effectively models real-world automata and graph data
LZA achieves significant compression improvements
Abstract
We present a theoretical framework for the compression of automata, which are widely used in speech processing and other natural language processing tasks. The framework extends to graph compression. Similar to stationary ergodic processes, we formulate a probabilistic process of graph and automata generation that captures real world phenomena and provide a universal compression scheme LZA for this probabilistic model. Further, we show that LZA significantly outperforms other compression techniques such as gzip and the UNIX compress command for several synthetic and real data sets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
