An Alternative Probabilistic Interpretation of the Huber Loss
Gregory P. Meyer

TL;DR
This paper introduces an alternative probabilistic interpretation of the Huber loss, linking it to KL divergence between Laplace distributions, which aids in selecting hyper-parameters based on noise estimation in data.
Contribution
It proposes a new probabilistic perspective that relates the Huber loss transition point to noise distribution parameters, improving hyper-parameter selection.
Findings
The new interpretation relates the transition point to noise in data.
It enables intuitive hyper-parameter tuning based on noise estimation.
Demonstrated effectiveness on object detection models.
Abstract
The Huber loss is a robust loss function used for a wide range of regression tasks. To utilize the Huber loss, a parameter that controls the transitions from a quadratic function to an absolute value function needs to be selected. We believe the standard probabilistic interpretation that relates the Huber loss to the Huber density fails to provide adequate intuition for identifying the transition point. As a result, a hyper-parameter search is often necessary to determine an appropriate value. In this work, we propose an alternative probabilistic interpretation of the Huber loss, which relates minimizing the loss to minimizing an upper-bound on the Kullback-Leibler divergence between Laplace distributions, where one distribution represents the noise in the ground-truth and the other represents the noise in the prediction. In addition, we show that the parameters of the Laplace…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHuber loss · Region Proposal Network · Softmax · Convolution · RoIPool · Faster R-CNN
