Data efficiency, dimensionality reduction, and the generalized symmetric   information bottleneck

K. Michael Martini; Ilya Nemenman

arXiv:2309.05649·cs.IT·February 6, 2024

Data efficiency, dimensionality reduction, and the generalized symmetric information bottleneck

K. Michael Martini, Ilya Nemenman

PDF

Open Access

TL;DR

This paper introduces the Generalized Symmetric Information Bottleneck (GSIB), a new method for dimensionality reduction that is more data-efficient when compressing multiple variables simultaneously, supported by theoretical bounds and estimates.

Contribution

The paper proposes GSIB, extending the Symmetric Information Bottleneck, and analyzes its data efficiency and statistical properties for simultaneous variable compression.

Findings

01

GSIB requires less data for similar errors compared to independent compression.

02

Simultaneous compression is generally more data-efficient than separate compression.

03

Theoretical bounds and estimates support the advantages of GSIB.

Abstract

The Symmetric Information Bottleneck (SIB), an extension of the more familiar Information Bottleneck, is a dimensionality reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the Generalized Symmetric Information Bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the dataset size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that, in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsError Correcting Code Techniques · Stochastic Gradient Optimization Techniques · Advanced Data Compression Techniques