Generalized massive optimal data compression
Justin Alsing, Benjamin Wandelt

TL;DR
This paper introduces a general method for optimally compressing large datasets into a minimal set of informative summaries that retain all Fisher information, applicable to both Gaussian and non-Gaussian data.
Contribution
It generalizes existing linear compression techniques to a unified framework that produces optimal non-linear summaries for diverse data distributions.
Findings
Compression to the score function preserves Fisher information.
The method recovers linear and quadratic compression as special cases.
Explicit derivation of optimal summaries for Gaussian data with parameter-dependent mean and covariance.
Abstract
Data compression has become one of the cornerstones of modern astronomical data analysis, with the vast majority of analyses compressing large raw datasets down to a manageable number of informative summaries. In this paper we provide a general procedure for optimally compressing data down to summary statistics, where is equal to the number of parameters of interest. We show that compression to the score function -- the gradient of the log-likelihood with respect to the parameters -- yields compressed statistics that are optimal in the sense that they preserve the Fisher information content of the data. Our method generalizes earlier work on linear Karhunen-Lo\'{e}ve compression for Gaussian data whilst recovering both lossless linear compression and quadratic estimation as special cases when they are optimal. We give a unified treatment that also includes the general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
