MicroNet for Efficient Language Modeling

Zhongxia Yan; Hanrui Wang; Demi Guo; Song Han

arXiv:2005.07877·cs.CL·May 19, 2020

MicroNet for Efficient Language Modeling

Zhongxia Yan, Hanrui Wang, Demi Guo, Song Han

PDF

Open Access 1 Repo

TL;DR

This paper presents a highly efficient transformer-based language model that significantly reduces parameters and computation while maintaining performance, winning the NeurIPS 2019 MicroNet Challenge.

Contribution

It introduces a novel combination of techniques including adaptive embedding, differentiable cache, Hebbian softmax, and quantization to create a compact, efficient language model.

Findings

01

Model is 90x more parameter-efficient than baseline.

02

Model is 36x more computation-efficient.

03

Achieves perplexity of 35 on Wikitext-103.

Abstract

It is important to design compact language models for efficient deployment. We improve upon recent advances in both the language modeling domain and the model-compression domain to construct parameter and computation efficient language models. We use an efficient transformer-based architecture with adaptive embedding and softmax, differentiable non-parametric cache, Hebbian softmax, knowledge distillation, network pruning, and low-bit quantization. In this paper, we provide the winning solution to the NeurIPS 2019 MicroNet Challenge in the language modeling track. Compared to the baseline language model provided by the MicroNet Challenge, our model is 90 times more parameter-efficient and 36 times more computation-efficient while achieving the required test perplexity of 35 on the Wikitext-103 dataset. We hope that this work will aid future research into efficient language models, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mit-han-lab/neurips-micronet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications