When in Doubt, Summon the Titans: Efficient Inference with Large Models
Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed,, Sanjiv Kumar

TL;DR
This paper introduces a two-stage distillation framework that enables efficient inference with large models by focusing on easy examples for lightweight models and falling back on large teachers for hard cases, improving practicality and accuracy.
Contribution
The paper proposes a novel distillation-based approach that selectively handles easy and hard examples to reduce inference costs while maintaining large model benefits.
Findings
Improved inference efficiency on image and NLP benchmarks.
Achieved better accuracy than standard distillation methods.
Reduced computational costs for large models in practical scenarios.
Abstract
Scaling neural networks to "large" sizes, with billions of parameters, has been shown to yield impressive results on many challenging problems. However, the inference cost incurred by such large models often prevents their application in most real-world settings. In this paper, we propose a two-stage framework based on distillation that realizes the modelling benefits of the large models, while largely preserving the computational benefits of inference with more lightweight models. In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher. Such an approach allows us to efficiently employ large models in practical scenarios where easy examples are much more frequent than rare hard examples. Our proposed use of distillation to only handle easy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
