Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization
Davide Buffelli, Jamie McGowan, Wangkun Xu, Alexandru Cioba, Da-shan, Shiu, Guillaume Hennequin, Alberto Bernacchia

TL;DR
This paper demonstrates that exact Gauss-Newton optimization in deep reversible architectures leads to poor generalization, with models overfitting mini-batches and not developing transferable features, especially in the lazy training regime.
Contribution
It introduces a tractable form of exact Gauss-Newton updates for deep reversible architectures and investigates their training and generalization behaviors.
Findings
Exact GN updates are tractable in deep reversible architectures.
GN optimizer exhibits poor generalization, overfitting mini-batches.
Training occurs in the lazy regime with minimal NTK change.
Abstract
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications, often yielding faster progress per iteration on the training loss compared to first-order optimizers. However, the generalization properties of second-order methods are still being debated. Theoretical investigations have proved difficult to carry out outside the tractable settings of heavily simplified model classes -- thus, the relevance of existing theories to practical deep learning applications remains unclear. Similarly, empirical studies in large-scale models and real datasets are significantly confounded by the necessity to approximate second-order updates in practice. It is often unclear whether the observed generalization behaviour arises specifically from the second-order nature of the parameter updates, or instead reflects the specific structured (e.g.\…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage Processing Techniques and Applications · Cellular Automata and Applications · Metaheuristic Optimization Algorithms Research
