A Systematic Study of Knowledge Distillation for Natural Language   Generation with Pseudo-Target Training

Nitay Calderon; Subhabrata Mukherjee; Roi Reichart; Amir Kantor

arXiv:2305.02031·cs.CL·May 29, 2023·1 cites

A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training

Nitay Calderon, Subhabrata Mukherjee, Roi Reichart, Amir Kantor

PDF

Open Access 1 Repo

TL;DR

This paper systematically investigates knowledge distillation techniques for compressing natural language generation models, emphasizing task-specific training, pseudo-target augmentation, and validation with minimal labeled data, to improve efficiency in real-world applications.

Contribution

It introduces a family of pseudo-target augmentation methods and the joint-teaching approach, advancing task-specific knowledge distillation for NLG models, especially under limited labeled data conditions.

Findings

01

Pseudo-target augmentation significantly improves distillation.

02

Joint-teaching enhances knowledge transfer from teacher to student.

03

Effective model compression achieved with minimal labeled data.

Abstract

Modern Natural Language Generation (NLG) models come with massive computational and storage requirements. In this work, we study the potential of compressing them, which is crucial for real-world applications serving millions of users. We focus on Knowledge Distillation (KD) techniques, in which a small student model learns to imitate a large teacher model, allowing to transfer knowledge from the teacher to the student. In contrast to much of the previous work, our goal is to optimize the model for a specific NLG task and a specific dataset. Typically in real-world applications, in addition to labeled data there is abundant unlabeled task-specific data, which is crucial for attaining high compression rates via KD. In this work, we conduct a systematic study of task-specific KD techniques for various NLG tasks under realistic assumptions. We discuss the special characteristics of NLG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nitaytech/kd4gen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Residual Connection · Position-Wise Feed-Forward Layer · Absolute Position Encodings