Differentiable Zero-One Loss via Hypersimplex Projections

Camilo Gomez; Pengyang Wang; Liansheng Tang

arXiv:2602.23336·cs.LG·February 27, 2026

Differentiable Zero-One Loss via Hypersimplex Projections

Camilo Gomez, Pengyang Wang, Liansheng Tang

PDF

Open Access

TL;DR

This paper introduces a differentiable approximation to the zero-one loss using hypersimplex projections, enabling gradient-based optimization for classification tasks.

Contribution

It proposes a novel Soft-Binary-Argmax operator based on hypersimplex projections, allowing zero-one loss to be used in end-to-end differentiable models.

Findings

01

Improves generalization in large-batch training.

02

Achieves tighter geometric consistency constraints.

03

Bridges the gap between zero-one loss and gradient-based methods.

Abstract

Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Advanced Graph Neural Networks