Learning Surrogate Losses
Josif Grabocka, Randolf Scholz, Lars Schmidt-Thieme

TL;DR
This paper introduces a versatile optimization method that learns smooth surrogate losses for non-differentiable metrics, enabling effective training across various complex evaluation criteria in machine learning.
Contribution
It proposes a novel bilevel optimization approach to learn surrogate neural network losses for any non-differentiable evaluation metric, improving training flexibility.
Findings
Effective minimization of diverse real-world loss functions.
Outperforms state-of-the-art baselines on multiple datasets.
Surrogate losses are invariant to mini-batch order.
Abstract
The minimization of loss functions is the heart and soul of Machine Learning. In this paper, we propose an off-the-shelf optimization approach that can minimize virtually any non-differentiable and non-decomposable loss function (e.g. Miss-classification Rate, AUC, F1, Jaccard Index, Mathew Correlation Coefficient, etc.) seamlessly. Our strategy learns smooth relaxation versions of the true losses by approximating them through a surrogate neural network. The proposed loss networks are set-wise models which are invariant to the order of mini-batch instances. Ultimately, the surrogate losses are learned jointly with the prediction model via bilevel optimization. Empirical results on multiple datasets with diverse real-life loss functions compared with state-of-the-art baselines demonstrate the efficiency of learning surrogate losses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Advanced Neural Network Applications
