Knowledge Distillation Leveraging Alternative Soft Targets from   Non-Parallel Qualified Speech Data

Tohru Nagano; Takashi Fukuda; Gakuto Kurata

arXiv:2112.08878·cs.SD·December 17, 2021

Knowledge Distillation Leveraging Alternative Soft Targets from Non-Parallel Qualified Speech Data

Tohru Nagano, Takashi Fukuda, Gakuto Kurata

PDF

Open Access

TL;DR

This paper introduces a knowledge distillation method that uses alternative soft targets from acoustically qualified speech data to improve speech recognition accuracy, acting as a form of data augmentation with privileged information.

Contribution

It proposes a novel framework leveraging qualified speech data as secondary soft targets for knowledge distillation, enhancing recognition performance over traditional methods.

Findings

01

Improved recognition accuracy with the proposed method.

02

Effective use of qualified data as privileged information.

03

Enhanced model robustness through target-side data augmentation.

Abstract

This paper describes a novel knowledge distillation framework that leverages acoustically qualified speech data included in an existing training data pool as privileged information. In our proposed framework, a student network is trained with multiple soft targets for each utterance that consist of main soft targets from original speakers' utterance and alternative targets from other speakers' utterances spoken under better acoustic conditions as a secondary view. These qualified utterances from other speakers, used to generate better soft targets, are collected from a qualified data pool by using strict constraints in terms of word/phone/state durations. Our proposed method is a form of target-side data augmentation that creates multiple copies of data with corresponding better soft targets obtained from a qualified data pool. We show in our experiments under acoustic model adaptation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing