Training Over-parameterized Models with Non-decomposable Objectives
Harikrishna Narasimhan, Aditya Krishna Menon

TL;DR
This paper introduces new cost-sensitive loss functions for training over-parameterized models on complex, non-decomposable objectives, improving upon standard re-weighting methods and demonstrating effectiveness on image benchmarks.
Contribution
The authors propose calibrated, generalized cost-sensitive losses that extend logit adjustment, addressing limitations of traditional re-weighting in over-parameterized models.
Findings
New loss functions improve training outcomes for complex objectives.
Calibrated losses enhance robustness and fairness in models.
Experimental results on image datasets validate the approach.
Abstract
Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the training loss with example-specific costs. We point out that the standard approach of re-weighting the loss to incorporate label costs can produce unsatisfactory results when used to train over-parameterized models. As a remedy, we propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model. Through experiments on benchmark image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Batch Normalization · Residual Connection · Average Pooling · Global Average Pooling · 1x1 Convolution · Kaiming Initialization · Residual Block · Bottleneck Residual Block
