The Outer Product Structure of Neural Network Derivatives
Craig Bakker, Michael J. Henry, and Nathan O. Hodas

TL;DR
This paper reveals an outer product derivative structure in feedforward and recurrent neural networks, enabling efficient use of higher-order information and offering new insights into optimization and regularization techniques.
Contribution
It identifies the outer product derivative structure in certain neural networks and discusses its implications for training, regularization, and network analysis.
Findings
Feedforward and recurrent networks exhibit outer product derivative structure.
Convolutional neural networks do not have this structure.
This structure facilitates higher-order information use without large memory costs.
Abstract
In this paper, we show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutional neural networks do not. This structure makes it possible to use higher-order information without needing approximations or infeasibly large amounts of memory, and it may also provide insights into the geometry of neural network optima. The ability to easily access these derivatives also suggests a new, geometric approach to regularization. We then discuss how this structure could be used to improve training methods, increase network robustness and generalizability, and inform network compression methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Adversarial Robustness in Machine Learning
