Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction

Mehryar Mohri; Yutao Zhong

arXiv:2604.27742·cs.LG·May 1, 2026

Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction

Mehryar Mohri, Yutao Zhong

PDF

TL;DR

The paper introduces Linear-Core Surrogates, a new family of convex loss functions that combine the optimization efficiency of smooth losses with the statistical benefits of margin-based losses, applicable to classification and structured prediction.

Contribution

It proposes a novel family of loss functions that are differentiable everywhere and retain linear consistency bounds, improving both optimization and statistical properties.

Findings

01

Achieves a 23× speedup over Structured SVMs on large-vocabulary sequence tagging.

02

Demonstrates superior robustness to label noise, outperforming Cross-Entropy by 2.6% on corrupted CIFAR-10.

03

Proves that the new surrogates combine smoothness with linear $H$-consistency bounds.

Abstract

The choice of loss function in classification involves a fundamental trade-off: smooth losses (like Cross-Entropy) enable fast optimization rates but yield slow square-root consistency bounds, while piecewise-linear losses (like Hinge) offer fast linear consistency rates but suffer from non-differentiability. We propose Linear-Core (LC) Surrogates, a new family of convex loss functions that resolve this tension by stitching a linear core to a smooth tail. We prove that these surrogates are differentiable everywhere while retaining strict linear $H$ -consistency bounds, effectively combining the optimization benefits of smoothness with the statistical efficiency of margin-based losses. In the structured prediction setting, we show that this smoothness unlocks a massive computational and energy advantage: it allows for an unbiased stochastic gradient estimator that bypasses the quadratic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.