Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data
Thomas T.C.K. Zhang, Leonardo F. Toso, James Anderson, Nikolai Matni

TL;DR
This paper introduces a novel method called DFW for efficient linear representation learning from non-i.i.d. and non-isotropic data, achieving near-optimal sample complexity and overcoming biases of existing approaches.
Contribution
The paper proposes DFW, an adaptation of alternating minimization, with proven linear convergence and improved noise scaling, unifying and extending prior theoretical results.
Findings
DFW achieves linear convergence to the optimal representation.
It reduces noise dependence to total source data size.
Vanilla methods fail catastrophically on mildly non-isotropic data.
Abstract
A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators from noisy vector measurements , where the covariates may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic representation learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Speech Recognition and Synthesis
