Loss Functions for Classification using Structured Entropy

Brian Lucena

arXiv:2206.07122·stat.ML·June 16, 2022

Loss Functions for Classification using Structured Entropy

Brian Lucena

PDF

Open Access 1 Repo

TL;DR

This paper introduces structured entropy, a generalized loss function for classification that incorporates target structure, leading to improved results in structured target problems while maintaining theoretical properties of standard entropy.

Contribution

It proposes structured entropy as a flexible, simple generalization of cross-entropy that accounts for target structure without hierarchical assumptions.

Findings

01

Structured cross-entropy improves classification accuracy on structured targets.

02

The method retains key theoretical properties of standard entropy.

03

It is computationally efficient and easy to implement.

Abstract

Cross-entropy loss is the standard metric used to train classification models in deep learning and gradient boosting. It is well-known that this loss function fails to account for similarities between the different values of the target. We propose a generalization of entropy called {\em structured entropy} which uses a random partition to incorporate the structure of the target variable in a manner which retains many theoretical properties of standard entropy. We show that a structured cross-entropy loss yields better results on several classification problems where the target variable has an a priori known structure. The approach is simple, flexible, easily computable, and does not rely on a hierarchically defined notion of structure.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

numeristical/resources
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning