Smooth Loss Functions for Deep Top-k Classification
Leonard Berrada, Andrew Zisserman, M. Pawan Kumar

TL;DR
This paper introduces smooth, non-sparse gradient loss functions tailored for deep top-k classification, demonstrating improved robustness over cross-entropy in noisy and limited data scenarios.
Contribution
The authors propose a family of smooth loss functions for top-k deep learning, with an efficient algorithm leveraging polynomial algebra and GPU approximations.
Findings
Our loss functions outperform cross-entropy in noisy data conditions.
The proposed algorithm reduces computational complexity to O(k n).
Smooth loss functions show increased robustness to overfitting.
Abstract
The top-k error is a common measure of performance in machine learning and computer vision. In practice, top-k classification is typically performed with deep neural networks trained with the cross-entropy loss. Theoretical results indeed suggest that cross-entropy is an optimal learning objective for such a task in the limit of infinite data. In the context of limited and noisy data however, the use of a loss function that is specifically designed for top-k classification can bring significant improvements. Our empirical evidence suggests that the loss function must be smooth and have non-sparse gradients in order to work well with deep neural networks. Consequently, we introduce a family of smoothed loss functions that are suited to top-k optimization via deep learning. The widely used cross-entropy is a special case of our family. Evaluating our smooth loss functions is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning and Algorithms
