Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples
Thomas T. Zhang, Bruce D. Lee, Ingvar Ziemann, George J. Pappas,, Nikolai Matni

TL;DR
This paper provides theoretical guarantees for learning nonlinear representations from multiple sources with non-identical distributions and dependencies, showing how task diversity and data quantity influence sample complexity and risk bounds.
Contribution
It introduces a framework for analyzing sample complexity and risk in nonlinear representation learning with dependent, non-i.i.d. data across multiple tasks.
Findings
Sample complexity depends on data dependency and task diversity.
Risk bounds improve with more tasks, approaching iid regression performance.
Dependency affects sample requirements but not the asymptotic risk bound.
Abstract
A driving force behind the diverse applicability of modern machine learning is the ability to extract meaningful features across many sources. However, many practical domains involve data that are non-identically distributed across sources, and statistically dependent within its source, violating vital assumptions in existing theoretical studies. Toward addressing these issues, we establish statistical guarantees for learning general representations from multiple data sources that admit different input distributions and possibly dependent data. Specifically, we study the sample-complexity of learning functions from a function class , where are task specific linear functions and is a shared nonlinear representation. A representation is estimated using samples from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
