Distilling Neural Networks for Greener and Faster Dependency Parsing

Mark Anderson; Carlos G\'omez-Rodr\'iguez

arXiv:2006.00844·cs.CL·June 2, 2020

Distilling Neural Networks for Greener and Faster Dependency Parsing

Mark Anderson, Carlos G\'omez-Rodr\'iguez

PDF

TL;DR

This paper demonstrates that teacher-student distillation can significantly reduce the size and increase the speed of neural dependency parsers with minimal accuracy loss, leading to greener and more efficient NLP models.

Contribution

It introduces a distillation approach to compress a state-of-the-art dependency parser, achieving comparable accuracy with much smaller models and faster inference times.

Findings

01

20 ext% model size retains ~99 ext% accuracy

02

2.30x faster inference on CPU

03

Outperforms fastest modern parser on Penn Treebank

Abstract

The carbon footprint of natural language processing research has been increasing in recent years due to its reliance on large and inefficient neural network implementations. Distillation is a network compression technique which attempts to impart knowledge from a large model to a smaller one. We use teacher-student distillation to improve the efficiency of the Biaffine dependency parser which obtains state-of-the-art performance with respect to accuracy and parsing speed (Dozat and Manning, 2017). When distilling to 20\% of the original model's trainable parameters, we only observe an average decrease of $\sim$ 1 point for both UAS and LAS across a number of diverse Universal Dependency treebanks while being 2.30x (1.19x) faster than the baseline model on CPU (GPU) at inference time. We also observe a small increase in performance when compressing to 80\% for some treebanks. Finally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings