Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms
Felix Petersen, Christian Borgelt, Tobias Sutter, Hilde Kuehne, Oliver, Deussen, Stefano Ermon

TL;DR
Newton Losses leverage second-order curvature information of non-differentiable objectives to enhance neural network training, improving performance across various differentiable algorithms for sorting and shortest-path problems.
Contribution
The paper introduces Newton Losses, a novel method that uses second-order information of loss functions to improve training of neural networks with non-differentiable objectives.
Findings
Significant performance improvements on less-optimized algorithms.
Consistent improvements even on well-optimized algorithms.
Efficient use of second-order information without full second-order training.
Abstract
When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuously relax the objectives to provide gradients, enabling learning. However, such differentiable relaxations are often non-convex and can exhibit vanishing and exploding gradients, making them (already in isolation) hard to optimize. Here, the loss function poses the bottleneck when training a deep neural network. We present Newton Losses, a method for improving the performance of existing hard to optimize losses by exploiting their second-order information via their empirical Fisher and Hessian matrices. Instead of training the neural network with second-order techniques, we only utilize the loss function's second-order information to replace it by a Newton Loss, while training the network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Neural Networks and Applications
