TL;DR
This paper establishes fundamental sample-size conditions for sparse recovery from mixed-quality data, revealing how data heterogeneity affects information-theoretic and algorithmic thresholds differently.
Contribution
It introduces the first conditions for sparse recovery with mixed-quality data and analyzes the robustness of LASSO in heterogeneous noise settings.
Findings
Sample-size trade-off (Price of Quality) quantifies low-quality vs. high-quality data.
In the agnostic setting, one high-quality sample is worth at most two low-quality samples.
LASSO recovery threshold depends only on average noise, showing robustness to heterogeneity.
Abstract
We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that it is sufficient for to satisfy a linear trade-off defining the Price of Quality: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples for this sufficient condition to hold. In the informed setting, where the decoder is informed of per-sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
