Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
Neeraj Varshney, Chitta Baral

TL;DR
This paper explores model cascading in NLP, combining models of different sizes to enhance efficiency and accuracy, demonstrating significant computational savings and accuracy improvements across various tasks.
Contribution
It introduces and empirically evaluates model cascading as a simple method to improve NLP system efficiency and accuracy using multiple models of varying capacities.
Findings
Cascading saves up to 88.93% computation cost.
Achieves up to 2.18% accuracy improvement.
Adding more models increases efficiency gains.
Abstract
Do all instances need inference through the big models for a correct prediction? Perhaps not; some instances are easy and can be answered correctly by even small capacity models. This provides opportunities for improving the computational efficiency of systems. In this work, we present an explorative study on 'model cascading', a simple technique that utilizes a collection of models of varying capacities to accurately yet efficiently output predictions. Through comprehensive experiments in multiple task settings that differ in the number of models available for cascading (K value), we show that cascading improves both the computational efficiency and the prediction accuracy. For instance, in K=3 setting, cascading saves up to 88.93% computation cost and consistently achieves superior prediction accuracy with an improvement of up to 2.18%. We also study the impact of introducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Topic Modeling · Data Stream Mining Techniques
