The Importance of Being Parameters: An Intra-Distillation Method for   Serious Gains

Haoran Xu; Philipp Koehn; Kenton Murray

arXiv:2205.11416·cs.CL·October 25, 2022

The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains

Haoran Xu, Philipp Koehn, Kenton Murray

PDF

Open Access 1 Repo

TL;DR

This paper introduces intra-distillation, a method to balance parameter sensitivity in neural networks, leading to improved generalization and performance across various NLP tasks and languages.

Contribution

It proposes a novel intra-distillation technique that balances parameter contributions, enhancing model performance without pruning.

Findings

01

Significant BLEU score improvements in translation tasks.

02

Enhanced generalization in natural language understanding.

03

Effective across multiple languages and tasks.

Abstract

Recent model pruning methods have demonstrated the ability to remove redundant parameters without sacrificing model performance. Common methods remove redundant parameters according to the parameter sensitivity, a gradient-based measure reflecting the contribution of the parameters. In this paper, however, we argue that redundant parameters can be trained to make beneficial contributions. We first highlight the large sensitivity (contribution) gap among high-sensitivity and low-sensitivity parameters and show that the model generalization performance can be significantly improved after balancing the contribution of all parameters. Our goal is to balance the sensitivity of all parameters and encourage all of them to contribute equally. We propose a general task-agnostic method, namely intra-distillation, appended to the regular training loss to balance parameter sensitivity. Moreover, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fe1ixxu/intra-distillation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsPruning