Label Embedding via Low-Coherence Matrices
Jianxin Zhang, Clayton Scott

TL;DR
This paper analyzes label embedding for large-scale multiclass classification, revealing a trade-off between efficiency and coherence, and demonstrates an effective scalable algorithm with empirical validation.
Contribution
It provides a theoretical excess risk bound linking coherence to efficiency and introduces a scalable algorithm for label embedding in extreme classification.
Findings
Risk bound depends on embedding coherence
Low coherence reduces statistical penalty under noise
Algorithm is effective and scalable in large-scale tasks
Abstract
Label embedding is a framework for multiclass classification problems where each label is represented by a distinct vector of some fixed dimension, and training involves matching model output to the vector representing the correct label. While label embedding has been successfully applied in extreme classification and zero-shot learning, and offers both computational and statistical advantages, its theoretical foundations remain poorly understood. This work presents an analysis of label embedding in the context of extreme multiclass classification, where the number of classes is very large. We present an excess risk bound that reveals a trade-off between computational and statistical efficiency, quantified via the coherence of the embedding matrix. We further show that under the Massart noise condition, the statistical penalty for label embedding vanishes with sufficiently low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUltrasonics and Acoustic Wave Propagation · Image Processing Techniques and Applications · Domain Adaptation and Few-Shot Learning
