Hard labels sampled from sparse targets mislead rotation invariant algorithms

Avrajit Ghosh; Bin Yu; Manfred Warmuth; Peter Bartlett

arXiv:2603.20967·stat.ML·March 24, 2026

Hard labels sampled from sparse targets mislead rotation invariant algorithms

Avrajit Ghosh, Bin Yu, Manfred Warmuth, Peter Bartlett

PDF

Open Access

TL;DR

The paper demonstrates that rotation-invariant algorithms like gradient descent can be misled by hard labels sampled from sparse targets, resulting in suboptimal excess risk compared to non-rotation-invariant methods.

Contribution

It proves that rotation-invariant algorithms are fundamentally limited in learning sparse targets from hard labels, showing their excess risk is worse than non-rotation-invariant algorithms.

Findings

01

Rotation-invariant algorithms have an excess risk of (d-1)/n.

02

Non-rotation-invariant algorithms achieve excess risk (s log d)/n.

03

Hard labels sampled from sparse targets can mislead rotation-invariant algorithms.

Abstract

One of the most common machine learning setups is logistic regression. In many classification models, including neural networks, the final prediction is obtained by applying a logistic link function to a linear score. In binary logistic regression, the feedback can be either soft labels, corresponding to the true conditional probability of the data (as in distillation), or sampled hard labels (taking values $\pm 1$ ). We point out a fundamental problem that arises even in a particularly favorable setting, where the goal is to learn a noise-free soft target of the form $σ (x^{⊤} w^{⋆})$ . In the over-constrained case (i.e. the number of samples $n$ exceeds the input dimension $d$ ) with examples $(x_{i}, σ (x_{i}^{⊤} w^{⋆}))$ , it is sufficient to recover $w^{⋆}$ and hence achieve the Bayes risk. However, we prove that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference