A Theoretical Analysis of Soft-Label vs Hard-Label Training in Neural Networks
Saptarshi Mandal, Xiaojun Lin, R. Srikant

TL;DR
This paper provides a theoretical explanation for why soft-label training in neural networks requires fewer neurons than hard-label training, especially on challenging datasets, supported by experiments on simple and deep models.
Contribution
It offers a theoretical analysis showing soft-label training needs fewer neurons than hard-label training in neural networks, especially for difficult classification tasks.
Findings
Soft-label training outperforms hard-label training in accuracy.
The neuron requirement for soft-label training is significantly lower when the dataset is challenging.
Experimental validation on deep neural networks supports the theoretical results.
Abstract
Knowledge distillation, where a small student model learns from a pre-trained large teacher model, has achieved substantial empirical success since the seminal work of \citep{hinton2015distilling}. Despite prior theoretical studies exploring the benefits of knowledge distillation, an important question remains unanswered: why does soft-label training from the teacher require significantly fewer neurons than directly training a small neural network with hard labels? To address this, we first present motivating experimental results using simple neural network models on a binary classification problem. These results demonstrate that soft-label training consistently outperforms hard-label training in accuracy, with the performance gap becoming more pronounced as the dataset becomes increasingly difficult to classify. We then substantiate these observations with a theoretical contribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Processing Techniques · Neural Networks and Applications · Statistical and Computational Modeling
