BabyBear: Cheap inference triage for expensive language models

Leila Khalili; Yao You; John Bohannon

arXiv:2205.11747·cs.CL·May 25, 2022·1 cites

BabyBear: Cheap inference triage for expensive language models

Leila Khalili, Yao You, John Bohannon

PDF

Open Access 1 Repo

TL;DR

BabyBear introduces a cascading inference framework for NLP that reduces computational costs by early exiting with high-confidence predictions, achieving over 50% cost savings while maintaining accuracy.

Contribution

It adapts model cascading and inference triage to NLP, enabling significant cost reductions in large-scale NLP tasks with minimal accuracy loss.

Findings

01

Over 50% reduction in compute cost for classification tasks.

02

33% compute savings in named entity recognition while maintaining high F1 score.

03

Effective use of cheap models for most inference load.

Abstract

Transformer language models provide superior accuracy over previous models but they are computationally and environmentally expensive. Borrowing the concept of model cascading from computer vision, we introduce BabyBear, a framework for cascading models for natural language processing (NLP) tasks to minimize cost. The core strategy is inference triage, exiting early when the least expensive model in the cascade achieves a sufficiently high-confidence prediction. We test BabyBear on several open source data sets related to document classification and entity recognition. We find that for common NLP tasks a high proportion of the inference load can be accomplished with cheap, fast models that have learned by observing a deep learning model. This allows us to reduce the compute cost of large-scale classification jobs by more than 50% while retaining overall accuracy. For named entity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

primerai/primer-research
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis