Functional Properties of the Focal-Entropy
Jaimin Shah, Martina Cardone, Alex Dytso

TL;DR
This paper provides a comprehensive information-theoretic analysis of the focal-entropy, a variant of focal-loss, revealing its mathematical properties, behavior under class imbalance, and implications for imbalanced classification tasks.
Contribution
It offers the first systematic theoretical study of focal-entropy, detailing its properties, structure, and effects, thus deepening understanding of focal-loss in imbalanced learning.
Findings
Focal-entropy is finite, convex, and continuous under certain conditions.
It amplifies mid-range probabilities and suppresses high-probability outcomes.
Under extreme class imbalance, it induces over-suppression of small probabilities.
Abstract
The focal-loss has become a widely used alternative to cross-entropy in class-imbalanced classification problems, particularly in computer vision. Despite its empirical success, a systematic information-theoretic study of the focal-loss remains incomplete. In this work, we adopt a distributional viewpoint and study the focal-entropy, a focal-loss analogue of the cross-entropy. Our analysis establishes conditions for finiteness, convexity, and continuity of the focal-entropy, and provides various asymptotic characterizations. We prove the existence and uniqueness of the focal-entropy minimizer, describe its structure, and show that it can depart significantly from the data distribution. In particular, we rigorously show that the focal-loss amplifies mid-range probabilities, suppresses high-probability outcomes, and, under extreme class imbalance, induces an over-suppression regime in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
