Hard labels sampled from sparse targets mislead rotation invariant algorithms
Avrajit Ghosh, Bin Yu, Manfred Warmuth, Peter Bartlett

TL;DR
The paper demonstrates that rotation-invariant algorithms like gradient descent can be misled by hard labels sampled from sparse targets, resulting in suboptimal excess risk compared to non-rotation-invariant methods.
Contribution
It proves that rotation-invariant algorithms are fundamentally limited in learning sparse targets from hard labels, showing their excess risk is worse than non-rotation-invariant algorithms.
Findings
Rotation-invariant algorithms have an excess risk of (d-1)/n.
Non-rotation-invariant algorithms achieve excess risk (s log d)/n.
Hard labels sampled from sparse targets can mislead rotation-invariant algorithms.
Abstract
One of the most common machine learning setups is logistic regression. In many classification models, including neural networks, the final prediction is obtained by applying a logistic link function to a linear score. In binary logistic regression, the feedback can be either soft labels, corresponding to the true conditional probability of the data (as in distillation), or sampled hard labels (taking values ). We point out a fundamental problem that arises even in a particularly favorable setting, where the goal is to learn a noise-free soft target of the form . In the over-constrained case (i.e. the number of samples exceeds the input dimension ) with examples , it is sufficient to recover and hence achieve the Bayes risk. However, we prove that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference
