# Conditional Teacher-Student Learning

**Authors:** Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong

arXiv: 1904.12399 · 2019-04-30

## TL;DR

This paper introduces a conditional teacher-student learning approach where the student selectively learns from the teacher or ground truth, improving performance in domain and speaker adaptation tasks.

## Contribution

The paper proposes a novel conditional learning scheme that enhances teacher-student training by selectively trusting the teacher model based on its prediction accuracy.

## Key findings

- Achieved 9.8% relative WER reduction on CHiME-3 dataset.
- Achieved 12.8% relative WER reduction on Microsoft short message dataset.
- Outperforms traditional T/S learning in adaptation tasks.

## Abstract

The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance. To overcome this problem, we propose a conditional T/S learning scheme, in which a "smart" student model selectively chooses to learn from either the teacher model or the ground truth labels conditioned on whether the teacher can correctly predict the ground truth. Unlike a naive linear combination of the two knowledge sources, the conditional learning is exclusively engaged with the teacher model when the teacher model's prediction is correct, and otherwise backs off to the ground truth. Thus, the student model is able to learn effectively from the teacher and even potentially surpass the teacher. We examine the proposed learning scheme on two tasks: domain adaptation on CHiME-3 dataset and speaker adaptation on Microsoft short message dictation dataset. The proposed method achieves 9.8% and 12.8% relative word error rate reductions, respectively, over T/S learning for environment adaptation and speaker-independent model for speaker adaptation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.12399/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1904.12399/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1904.12399/full.md

---
Source: https://tomesphere.com/paper/1904.12399