# Adaptive norms for deep learning with regularized Newton methods

**Authors:** Jonas Kohler, Leonard Adolphs, Aurelien Lucchi

arXiv: 1905.09201 · 2020-09-29

## TL;DR

This paper introduces an adaptive norm approach for second-order optimization in neural networks, showing theoretical convergence guarantees and empirical advantages over spherical constraints, with potential for future hardware improvements.

## Contribution

It establishes a connection between adaptive gradient methods and trust region methods, providing convergence proofs and demonstrating empirical benefits of ellipsoidal constraints.

## Key findings

- Ellipsoidal constraints outperform spherical ones in backpropagation count and loss value.
- Preconditioning matrices in Adam and RMSProp meet convergence conditions for trust region methods.
- Newton methods show promise but need hardware advances for competitive computational time.

## Abstract

We investigate the use of regularized Newton methods with adaptive norms for optimizing neural networks. This approach can be seen as a second-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we prove that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for provable convergence of second-order trust region methods with standard worst-case complexities on general non-convex objectives. Furthermore, we run experiments across different neural architectures and datasets to find that the ellipsoidal constraints constantly outperform their spherical counterpart both in terms of number of backpropagations and asymptotic loss value. Finally, we find comparable performance to state-of-the-art first-order methods in terms of backpropagations, but further advances in hardware are needed to render Newton methods competitive in terms of computational time.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.09201/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1905.09201/full.md

## References

69 references — full list in the complete paper: https://tomesphere.com/paper/1905.09201/full.md

---
Source: https://tomesphere.com/paper/1905.09201