A unified theory of feature learning in RNNs and DNNs
Jan P. Bauer, Kirsten Fischer, Moritz Helias, Agostina Palmigiano

TL;DR
This paper develops a unified mean-field theory for RNNs and DNNs, revealing how weight sharing in RNNs influences their functional properties and generalization in sequential tasks.
Contribution
It introduces a comprehensive theoretical framework linking RNN and DNN architectures through representational kernels and phase transitions in learning behavior.
Findings
Below threshold, RNNs and DNNs behave similarly.
Above threshold, RNNs develop correlated temporal representations.
Weight sharing in RNNs aids generalization in sequential tasks.
Abstract
Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit with the distinct functional properties these networks exhibit? To address this question, we here develop a unified mean-field theory for RNNs and DNNs in terms of representational kernels, describing fully trained networks in the feature learning (P) regime. This theory casts training as Bayesian inference over sequences and patterns, directly revealing the functional implications induced by the RNNs' weight sharing. In DNN-typical tasks, we identify a phase transition when the learning signal overcomes the noise due to randomness in the weights: below this threshold, RNNs and DNNs behave identically; above it, only RNNs develop correlated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques
