Self-Teaching Networks
Liang Lu, Eric Sun, Yifan Gong

TL;DR
Self-teaching networks enhance deep neural network generalization by using output-driven auxiliary losses to guide lower layers, improving gradient flow and regularization, demonstrated on large-scale speech recognition tasks.
Contribution
The paper introduces self-teaching networks that generate soft supervision labels to improve training and generalization of deep neural networks, especially in speech recognition.
Findings
Achieved consistent improvements over existing methods.
Outperformed label smoothing and confidence penalization.
Effective on large-scale speech recognition data.
Abstract
We propose self-teaching networks to improve the generalization capacity of deep neural networks. The idea is to generate soft supervision labels using the output layer for training the lower layers of the network. During the network training, we seek an auxiliary loss that drives the lower layer to mimic the behavior of the output layer. The connection between the two network layers through the auxiliary loss can help the gradient flow, which works similar to the residual networks. Furthermore, the auxiliary loss also works as a regularizer, which improves the generalization capacity of the network. We evaluated the self-teaching network with deep recurrent neural networks on speech recognition tasks, where we trained the acoustic model using 30 thousand hours of data. We tested the acoustic model using data collected from 4 scenarios. We show that the self-teaching network can achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods
MethodsLabel Smoothing
