Label Embedding via Low-Coherence Matrices

Jianxin Zhang; Clayton Scott

arXiv:2305.19470·cs.LG·September 1, 2025·1 cites

Label Embedding via Low-Coherence Matrices

Jianxin Zhang, Clayton Scott

PDF

Open Access

TL;DR

This paper analyzes label embedding for large-scale multiclass classification, revealing a trade-off between efficiency and coherence, and demonstrates an effective scalable algorithm with empirical validation.

Contribution

It provides a theoretical excess risk bound linking coherence to efficiency and introduces a scalable algorithm for label embedding in extreme classification.

Findings

01

Risk bound depends on embedding coherence

02

Low coherence reduces statistical penalty under noise

03

Algorithm is effective and scalable in large-scale tasks

Abstract

Label embedding is a framework for multiclass classification problems where each label is represented by a distinct vector of some fixed dimension, and training involves matching model output to the vector representing the correct label. While label embedding has been successfully applied in extreme classification and zero-shot learning, and offers both computational and statistical advantages, its theoretical foundations remain poorly understood. This work presents an analysis of label embedding in the context of extreme multiclass classification, where the number of classes $C$ is very large. We present an excess risk bound that reveals a trade-off between computational and statistical efficiency, quantified via the coherence of the embedding matrix. We further show that under the Massart noise condition, the statistical penalty for label embedding vanishes with sufficiently low…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUltrasonics and Acoustic Wave Propagation · Image Processing Techniques and Applications · Domain Adaptation and Few-Shot Learning