Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification

Qipeng Zhan; Zhuoping Zhou; Li Shen

arXiv:2604.06689·cs.LG·May 12, 2026

Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification

Qipeng Zhan, Zhuoping Zhou, Li Shen

PDF

TL;DR

Generative Cross-Entropy (GenCE) is a new loss function that improves data efficiency and calibration in classification tasks by integrating generative principles into standard discriminative models without changing their architecture.

Contribution

The paper introduces GenCE, a proper scoring rule that incorporates generative modeling into discriminative classifiers, enhancing performance especially with limited data.

Findings

01

GenCE outperforms standard CE and other losses across multiple datasets and architectures.

02

GenCE produces better-calibrated probabilities and improves out-of-distribution detection.

03

GenCE is strictly proper and minimizes the true posterior under mild conditions.

Abstract

Cross-entropy (CE) is the default training loss for supervised classification, but its sample efficiency is limited when labels are scarce. Existing remedies primarily act on the data side, via augmentation, synthesis, or transfer from pretrained models; the training objective itself is rarely revisited. We revisit it here. Drawing on the classical observation that generative classifiers reach their asymptotic error with fewer samples than discriminative ones, we propose Generative Cross-Entropy (GenCE), a drop-in replacement for CE that introduces a generative learning principle into a standard discriminative network without altering the architecture or fitting a separate density model. GenCE follows from a Bayesian rewrite of the class-conditional likelihood and, in the mini-batch approximation, reduces to normalizing each sample's softmax score against the model's predictions on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.