Large Alphabet Source Coding using Independent Component Analysis
Amichai Painsky, Saharon Rosset, Meir Feder

TL;DR
This paper introduces a novel large alphabet source coding method that decomposes sources into nearly independent components, enabling more efficient entropy encoding and outperforming traditional methods across various applications.
Contribution
It proposes a new framework using Independent Component Analysis for large alphabet source coding, improving efficiency and simplicity over existing methods.
Findings
Decomposition into independent components reduces coding complexity.
The method outperforms traditional coding techniques in multiple setups.
Framework is applicable to lossless, universal, and vector quantization scenarios.
Abstract
Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications such as compression of natural language text, speech and images. The classic perception of most commonly used methods is that a source is best described over an alphabet which is at least as large as the observed alphabet. In this work we challenge this approach and introduce a conceptual framework in which a large alphabet source is decomposed into "as statistically independent as possible" components. This decomposition allows us to apply entropy encoding to each component separately, while benefiting from their reduced alphabet size. We show that in many cases, such decomposition results in a sum of marginal entropies which is only slightly greater than the entropy of the source. Our suggested algorithm, based on a generalization of the Binary Independent Component Analysis,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
