# Ensemble knowledge distillation of self-supervised speech models

**Authors:** Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen,, Wei-Cheng Tseng, Kai-Wei Chang, Hung-yi Lee

arXiv: 2302.12757 · 2023-02-27

## TL;DR

This paper introduces a novel ensemble knowledge distillation approach for self-supervised speech models, combining multiple teachers to improve downstream speech task performance.

## Contribution

It proposes a new method for jointly distilling multiple self-supervised speech models using ensemble techniques and multiple prediction heads.

## Key findings

- Layerwise-average aggregation outperforms layerwise-concatenation.
- The proposed method improves performance on four speech tasks.
- Distilled models achieve state-of-the-art results on SUPERB benchmark.

## Abstract

Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerwise-average and layerwise-concatenation, to the representations of different teacher models and found that the former was more effective. On top of that, we proposed a multiple prediction head method for student models to predict different layer outputs of multiple teacher models simultaneously. The experimental results show that our method improves the performance of the distilled models on four downstream speech processing tasks, Phoneme Recognition, Speaker Identification, Emotion Recognition, and Automatic Speech Recognition in the hidden-set track of the SUPERB benchmark.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.12757/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/2302.12757/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/2302.12757/full.md

---
Source: https://tomesphere.com/paper/2302.12757