Learning Surrogate Losses

Josif Grabocka; Randolf Scholz; Lars Schmidt-Thieme

arXiv:1905.10108·cs.LG·May 27, 2019·27 cites

Learning Surrogate Losses

Josif Grabocka, Randolf Scholz, Lars Schmidt-Thieme

PDF

Open Access

TL;DR

This paper introduces a versatile optimization method that learns smooth surrogate losses for non-differentiable metrics, enabling effective training across various complex evaluation criteria in machine learning.

Contribution

It proposes a novel bilevel optimization approach to learn surrogate neural network losses for any non-differentiable evaluation metric, improving training flexibility.

Findings

01

Effective minimization of diverse real-world loss functions.

02

Outperforms state-of-the-art baselines on multiple datasets.

03

Surrogate losses are invariant to mini-batch order.

Abstract

The minimization of loss functions is the heart and soul of Machine Learning. In this paper, we propose an off-the-shelf optimization approach that can minimize virtually any non-differentiable and non-decomposable loss function (e.g. Miss-classification Rate, AUC, F1, Jaccard Index, Mathew Correlation Coefficient, etc.) seamlessly. Our strategy learns smooth relaxation versions of the true losses by approximating them through a surrogate neural network. The proposed loss networks are set-wise models which are invariant to the order of mini-batch instances. Ultimately, the surrogate losses are learned jointly with the prediction model via bilevel optimization. Empirical results on multiple datasets with diverse real-life loss functions compared with state-of-the-art baselines demonstrate the efficiency of learning surrogate losses.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Advanced Neural Network Applications