Beyond Lazy Training for Over-parameterized Tensor Decomposition

Xiang Wang; Chenwei Wu; Jason D. Lee; Tengyu Ma; Rong Ge

arXiv:2010.11356·stat.ML·October 23, 2020·1 cites

Beyond Lazy Training for Over-parameterized Tensor Decomposition

Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

PDF

Open Access 1 Video

TL;DR

This paper investigates over-parameterized tensor decomposition, demonstrating that gradient descent can find approximate solutions beyond the lazy training regime by exploiting low-rank structures, with specific bounds on the over-parameterization needed.

Contribution

It shows that gradient descent can succeed in tensor decomposition beyond the lazy training regime, leveraging low-rank structures, with explicit bounds on the over-parameterization.

Findings

01

Lazy training requires m = Ω(d^{l-1}) for tensor decomposition.

02

Gradient descent can find approximate solutions with m = O^*(r^{2.5l} log d).

03

Over-parameterized gradient descent can utilize low-rank structures beyond lazy training.

Abstract

Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$ -th order tensor in $(R^{d})^{\otimes l}$ of rank $r$ (where $r ≪ d$ ), can variants of gradient descent find a rank $m$ decomposition where $m > r$ ? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = Ω (d^{l - 1})$ , while a variant of gradient descent can find an approximate tensor when $m = O^{*} (r^{2.5 l} lo g d)$ . Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Lazy Training for Over-parameterized Tensor Decomposition· slideslive

Taxonomy

TopicsTensor decomposition and applications · Advanced Neuroimaging Techniques and Applications · Model Reduction and Neural Networks

MethodsNeural Tangent Kernel