GenDistiller: Distilling Pre-trained Language Models based on an   Autoregressive Generative Model

Yingying Gao; Shilei Zhang; Chao Deng; Junlan Feng

arXiv:2406.09444·eess.AS·June 24, 2024

GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model

Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

PDF

Open Access

TL;DR

GenDistiller is a novel autoregressive knowledge distillation framework that significantly reduces the size of pre-trained speech models like WavLM while maintaining or improving performance on downstream tasks.

Contribution

It introduces an autoregressive layer-by-layer distillation method that generates hidden representations, enabling substantial model compression with minimal performance loss.

Findings

01

Achieves 82% reduction in WavLM size.

02

Outperforms baseline distillation methods on SUPERB tasks.

03

Uses 33% fewer parameters with similar time consumption.

Abstract

Pre-trained speech language models such as HuBERT and WavLM leverage unlabeled speech data for self-supervised learning and offer powerful representations for numerous downstream tasks. Despite the success of these models, their high requirements for memory and computing resource hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowledge distillation framework which generates the hidden representations of the pre-trained teacher model directly by a much smaller student network. The proposed method takes the previous hidden layer as history and implements a layer-by-layer prediction of the teacher model autoregressively. Experiments on SUPERB reveal the advantage of GenDistiller over the baseline distilling method without an autoregressive framework, with 33% fewer parameters, similar time consumption and better performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsKnowledge Distillation