Beyond Lazy Training for Over-parameterized Tensor Decomposition
Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

TL;DR
This paper investigates over-parameterized tensor decomposition, demonstrating that gradient descent can find approximate solutions beyond the lazy training regime by exploiting low-rank structures, with specific bounds on the over-parameterization needed.
Contribution
It shows that gradient descent can succeed in tensor decomposition beyond the lazy training regime, leveraging low-rank structures, with explicit bounds on the over-parameterization.
Findings
Lazy training requires m = Ω(d^{l-1}) for tensor decomposition.
Gradient descent can find approximate solutions with m = O^*(r^{2.5l} log d).
Over-parameterized gradient descent can utilize low-rank structures beyond lazy training.
Abstract
Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an -th order tensor in of rank (where ), can variants of gradient descent find a rank decomposition where ? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least , while a variant of gradient descent can find an approximate tensor when . Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTensor decomposition and applications · Advanced Neuroimaging Techniques and Applications · Model Reduction and Neural Networks
MethodsNeural Tangent Kernel
