Adaptive Knowledge Distillation between Text and Speech Pre-trained   Models

Jinjie Ni; Yukun Ma; Wen Wang; Qian Chen; Dianwen Ng; Han Lei; Trung; Hieu Nguyen; Chong Zhang; Bin Ma; Erik Cambria

arXiv:2303.03600·cs.CL·March 8, 2023·1 cites

Adaptive Knowledge Distillation between Text and Speech Pre-trained Models

Jinjie Ni, Yukun Ma, Wen Wang, Qian Chen, Dianwen Ng, Han Lei, Trung, Hieu Nguyen, Chong Zhang, Bin Ma, Erik Cambria

PDF

Open Access

TL;DR

This paper introduces PAD, a novel metric-based knowledge distillation method that aligns text and speech models' embedding spaces using adaptive, prior-informed strategies, improving linguistic knowledge transfer without model modifications.

Contribution

It proposes the Prior-informed Adaptive knowledge Distillation (PAD) method, addressing modal disparity and semantic gaps between text and speech models with minimal data and no structural changes.

Findings

01

PAD outperforms other metric-based distillation methods on spoken language understanding benchmarks.

02

PAD effectively aligns text and speech embeddings, enhancing linguistic knowledge transfer.

03

The approach handles variable granularity and prior distributions for better global and local alignment.

Abstract

Learning on a massive amount of speech corpus leads to the recent success of many self-supervised speech models. With knowledge distillation, these models may also benefit from the knowledge encoded by language models that are pre-trained on rich sources of texts. The distillation process, however, is challenging due to the modal disparity between textual and speech embedding spaces. This paper studies metric-based distillation to align the embedding space of text and speech with only a small amount of data without modifying the model structure. Since the semantic and granularity gap between text and speech has been omitted in literature, which impairs the distillation, we propose the Prior-informed Adaptive knowledge Distillation (PAD) that adaptively leverages text/speech units of variable granularity and prior distributions to achieve better global and local alignments between text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsALIGN · Knowledge Distillation