IDK Cascades: Fast Deep Learning by Learning not to Overthink
Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Fisher Yu,, Joseph E. Gonzalez

TL;DR
The paper introduces the IDK Cascades framework, which accelerates deep learning inference by selectively avoiding overthinking on simple inputs, leveraging model cascades without retraining.
Contribution
It presents a novel cascade construction framework that reduces inference costs by learning when models should abstain, using search methods and cost-aware objectives.
Findings
Significant speedups on benchmark datasets
No loss in prediction accuracy with cascade approach
Easy integration into existing systems
Abstract
Advances in deep learning have led to substantial increases in prediction accuracy but have been accompanied by increases in the cost of rendering predictions. We conjecture that fora majority of real-world inputs, the recent advances in deep learning have created models that effectively "overthink" on simple inputs. In this paper, we revisit the classic question of building model cascades that primarily leverage class asymmetry to reduce cost. We introduce the "I Don't Know"(IDK) prediction cascades framework, a general framework to systematically compose a set of pre-trained models to accelerate inference without a loss in prediction accuracy. We propose two search based methods for constructing cascades as well as a new cost-aware objective within this framework. The proposed IDK cascade framework can be easily adopted in the existing model serving systems without additional model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Generative Adversarial Networks and Image Synthesis
