Understanding Dynamics of Nonlinear Representation Learning and Its Application
Kenji Kawaguchi, Linjun Zhang, Zhun Deng

TL;DR
This paper investigates the implicit dynamics of nonlinear representation learning in deep neural networks beyond the NTK regime, providing theoretical insights and a new training framework with empirical validation on standard datasets.
Contribution
It introduces the common model structure assumption and data-architecture alignment condition, offering theoretical guarantees for convergence and optimality in nonlinear representation learning.
Findings
The theory explains when increasing network size improves training.
A new training framework with convergence guarantees is proposed.
Empirical results show competitive performance on CIFAR-10, CIFAR-100, and SVHN.
Abstract
Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification (or regression) at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss in common practical regimes of deep learning, unlike the neural tangent kernel (NTK) regime. In this paper, we study the dynamics of such implicit nonlinear representation learning, which is beyond the NTK regime. We identify a pair of a new assumption and a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsNeural Tangent Kernel · Batch Normalization
