Optimizing Non-decomposable Measures with Deep Networks

Amartya Sanyal; Pawan Kumar; Purushottam Kar; Sanjay Chawla; Fabrizio; Sebastiani

arXiv:1802.00086·stat.ML·September 22, 2021

Optimizing Non-decomposable Measures with Deep Networks

Amartya Sanyal, Pawan Kumar, Purushottam Kar, Sanjay Chawla, Fabrizio, Sebastiani

PDF

TL;DR

This paper introduces algorithms for directly training deep neural networks on complex, task-specific performance measures like F-measure and KL divergence, leading to faster, more stable training.

Contribution

It presents novel algorithms that optimize non-decomposable, structured loss functions directly within deep learning frameworks, improving convergence and efficiency.

Findings

01

Faster and more stable convergence across datasets.

02

Reduced training time and sample requirements.

03

Outperforms traditional and recent task-specific training methods.

Abstract

We present a class of algorithms capable of directly training deep neural networks with respect to large families of task-specific performance measures such as the F-measure and the Kullback-Leibler divergence that are structured and non-decomposable. This presents a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields much faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations have several novel features including (i) convergence to first order stationary points despite optimizing complex objective functions; (ii) use of fewer training samples to achieve a desired level of convergence, (iii) a substantial reduction in training time, and (iv) a seamless…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.