Differentiable Zero-One Loss via Hypersimplex Projections
Camilo Gomez, Pengyang Wang, Liansheng Tang

TL;DR
This paper introduces a differentiable approximation to the zero-one loss using hypersimplex projections, enabling gradient-based optimization for classification tasks.
Contribution
It proposes a novel Soft-Binary-Argmax operator based on hypersimplex projections, allowing zero-one loss to be used in end-to-end differentiable models.
Findings
Improves generalization in large-batch training.
Achieves tighter geometric consistency constraints.
Bridges the gap between zero-one loss and gradient-based methods.
Abstract
Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Advanced Graph Neural Networks
