Generalized massive optimal data compression

Justin Alsing; Benjamin Wandelt

arXiv:1712.00012·astro-ph.CO·April 4, 2018

Generalized massive optimal data compression

Justin Alsing, Benjamin Wandelt

PDF

TL;DR

This paper introduces a general method for optimally compressing large datasets into a minimal set of informative summaries that retain all Fisher information, applicable to both Gaussian and non-Gaussian data.

Contribution

It generalizes existing linear compression techniques to a unified framework that produces optimal non-linear summaries for diverse data distributions.

Findings

01

Compression to the score function preserves Fisher information.

02

The method recovers linear and quadratic compression as special cases.

03

Explicit derivation of optimal summaries for Gaussian data with parameter-dependent mean and covariance.

Abstract

Data compression has become one of the cornerstones of modern astronomical data analysis, with the vast majority of analyses compressing large raw datasets down to a manageable number of informative summaries. In this paper we provide a general procedure for optimally compressing $N$ data down to $n$ summary statistics, where $n$ is equal to the number of parameters of interest. We show that compression to the score function -- the gradient of the log-likelihood with respect to the parameters -- yields $n$ compressed statistics that are optimal in the sense that they preserve the Fisher information content of the data. Our method generalizes earlier work on linear Karhunen-Lo\'{e}ve compression for Gaussian data whilst recovering both lossless linear compression and quadratic estimation as special cases when they are optimal. We give a unified treatment that also includes the general…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.