Training Deep Networks to be Spatially Sensitive
Nicholas Kolkin, Gregory Shakhnarovich, Eli Shechtman

TL;DR
This paper introduces a differentiable approximation of the Weighted F-measure to incorporate spatial relationships into training deep networks for pixel-wise tasks, improving accuracy and efficiency.
Contribution
It proposes a novel, efficient, differentiable approximation of the Weighted F-measure, enabling spatially sensitive training of deep networks for computer vision tasks.
Findings
Improved performance on saliency prediction and semantic segmentation.
Faster inference speeds compared to previous methods.
Significant gains in weighted F-measure scores.
Abstract
In many computer vision tasks, for example saliency prediction or semantic segmentation, the desired output is a foreground map that predicts pixels where some criteria is satisfied. Despite the inherently spatial nature of this task commonly used learning objectives do not incorporate the spatial relationships between misclassified pixels and the underlying ground truth. The Weighted F-measure, a recently proposed evaluation metric, does reweight errors spatially, and has been shown to closely correlate with human evaluation of quality, and stably rank predictions with respect to noisy ground truths (such as a sloppy human annotator might generate). However it suffers from computational complexity which makes it intractable as an optimization objective for gradient descent, which must be evaluated thousands or millions of times while learning a model's parameters. We propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
