Ensemble Distillation for Structured Prediction: Calibrated, Accurate, Fast-Choose Three
Steven Reich, David Mueller, Nicholas Andrews

TL;DR
This paper introduces ensemble distillation as a method to produce well-calibrated, accurate, and fast structured prediction models, effectively replacing ensembles without sacrificing performance, validated on NER and machine translation tasks.
Contribution
It presents a novel ensemble distillation framework for structured prediction that maintains ensemble benefits while enabling single-model inference.
Findings
Models retain ensemble performance and calibration benefits.
Distilled models are faster and require only one model at test time.
Framework effective on NER and machine translation tasks.
Abstract
Modern neural networks do not always produce well-calibrated predictions, even when trained with a proper scoring function such as cross-entropy. In classification settings, simple methods such as isotonic regression or temperature scaling may be used in conjunction with a held-out dataset to calibrate model outputs. However, extending these methods to structured prediction is not always straightforward or effective; furthermore, a held-out calibration set may not always be available. In this paper, we study ensemble distillation as a general framework for producing well-calibrated structured prediction models while avoiding the prohibitive inference-time cost of ensembles. We validate this framework on two tasks: named-entity recognition and machine translation. We find that, across both tasks, ensemble distillation produces models which retain much of, and occasionally improve upon,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
