The Outer Product Structure of Neural Network Derivatives

Craig Bakker; Michael J. Henry; and Nathan O. Hodas

arXiv:1810.03798·cs.LG·October 10, 2018·1 cites

The Outer Product Structure of Neural Network Derivatives

Craig Bakker, Michael J. Henry, and Nathan O. Hodas

PDF

Open Access

TL;DR

This paper reveals an outer product derivative structure in feedforward and recurrent neural networks, enabling efficient use of higher-order information and offering new insights into optimization and regularization techniques.

Contribution

It identifies the outer product derivative structure in certain neural networks and discusses its implications for training, regularization, and network analysis.

Findings

01

Feedforward and recurrent networks exhibit outer product derivative structure.

02

Convolutional neural networks do not have this structure.

03

This structure facilitates higher-order information use without large memory costs.

Abstract

In this paper, we show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutional neural networks do not. This structure makes it possible to use higher-order information without needing approximations or infeasibly large amounts of memory, and it may also provide insights into the geometry of neural network optima. The ability to easily access these derivatives also suggests a new, geometric approach to regularization. We then discuss how this structure could be used to improve training methods, increase network robustness and generalizability, and inform network compression methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Adversarial Robustness in Machine Learning