Loss factorization, weakly supervised learning and label noise   robustness

Giorgio Patrini; Frank Nielsen; Richard Nock; Marcello Carioni

arXiv:1602.02450·cs.LG·February 11, 2016·42 cites

Loss factorization, weakly supervised learning and label noise robustness

Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni

PDF

Open Access

TL;DR

This paper proves that many loss functions factor into a label-dependent and label-free part, enabling improved understanding of generalization, robustness, and adaptation to weak supervision and label noise.

Contribution

It introduces a loss factorization framework that enhances analysis of generalization, robustness, and weakly supervised learning, applicable to non-smooth and non-convex losses in RKHS.

Findings

01

Loss functions factor into label-dependent and label-free components.

02

Algorithms can be adapted for weak supervision using the mean operator.

03

Most losses exhibit data-dependent noise robustness.

Abstract

We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator --the focal quantity of this work-- which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like SGD and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Advanced Multi-Objective Optimization Algorithms

MethodsStochastic Gradient Descent