Bayesian Attention Networks for Data Compression
Michael Tetelman

TL;DR
This paper introduces Bayesian Attention Networks for lossless data compression, leveraging a latent space to relate training and prediction samples through attention mechanisms, resulting in a context-dependent solution approach.
Contribution
It proposes a novel Bayesian Attention Network framework with a latent space for practical, context-aware data compression based on sample correlations.
Findings
Attention factor is defined by sample correlation functions.
Latent space effectively maps prediction samples to context.
Approach enables context-dependent, efficient compression.
Abstract
The lossless data compression algorithm based on Bayesian Attention Networks is derived from first principles. Bayesian Attention Networks are defined by introducing an attention factor per a training sample loss as a function of two sample inputs, from training sample and prediction sample. By using a sharpened Jensen's inequality we show that the attention factor is completely defined by a correlation function of the two samples w.r.t. the model weights. Due to the attention factor the solution for a prediction sample is mostly defined by a few training samples that are correlated with the prediction sample. Finding a specific solution per prediction sample couples together the training and the prediction. To make the approach practical we introduce a latent space to map each prediction sample to a latent space and learn all possible solutions as a function of the latent space along…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Modeling and Causal Inference · Neural Networks and Applications
