Sample-Efficient Linear Representation Learning from Non-IID   Non-Isotropic Data

Thomas T.C.K. Zhang; Leonardo F. Toso; James Anderson; Nikolai Matni

arXiv:2308.04428·stat.ML·October 15, 2024·2 cites

Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

Thomas T.C.K. Zhang, Leonardo F. Toso, James Anderson, Nikolai Matni

PDF

Open Access

TL;DR

This paper introduces a novel method called DFW for efficient linear representation learning from non-i.i.d. and non-isotropic data, achieving near-optimal sample complexity and overcoming biases of existing approaches.

Contribution

The paper proposes DFW, an adaptation of alternating minimization, with proven linear convergence and improved noise scaling, unifying and extending prior theoretical results.

Findings

01

DFW achieves linear convergence to the optimal representation.

02

It reduces noise dependence to total source data size.

03

Vanilla methods fail catastrophically on mildly non-isotropic data.

Abstract

A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators $M$ from noisy vector measurements $y = M x + w$ , where the covariates $x$ may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic representation learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Speech Recognition and Synthesis