Beyond Gradient Descent for Regularized Segmentation Losses
Dmitrii Marin, Meng Tang, Ismail Ben Ayed, Yuri Boykov

TL;DR
This paper explores alternative optimization methods to gradient descent for regularized segmentation losses, demonstrating that methods like ADM outperform GD in weakly-supervised CNN segmentation tasks, and emphasizing the importance of optimization choices.
Contribution
It introduces a new loss function inspired by MRF/CRF regularization models and shows that alternative optimizers can achieve state-of-the-art results where GD fails.
Findings
ADM outperforms GD in weakly-supervised segmentation
Smoother loss tuning improves GD results
Network design should consider optimization methods
Abstract
The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In the context of weakly-supervised CNN segmentation, we demonstrate a well-motivated loss function where an alternative optimizer (ADM) achieves the state-of-the-art while GD performs poorly. Interestingly, GD obtains its best result for a "smoother" tuning of the loss function. The results are consistent across different network architectures. Our loss is motivated by well-understood MRF/CRF regularization models in "shallow" segmentation and their known global solvers. Our work suggests that network design/training should pay more attention to optimization methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Domain Adaptation and Few-Shot Learning
