Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models
Phyllis Ang, Bhuwan Dhingra, Lisa Wu Wills

TL;DR
This paper systematically analyzes the trade-offs between accuracy, speed, and energy consumption in long-sequence NLP models, revealing how model size and sequence length impact efficiency and performance across tasks.
Contribution
It provides a comparative study of Longformer-Encoder-Decoder and Big Bird, highlighting how hyperparameters affect the efficiency-accuracy trade-off in long-text NLP tasks.
Findings
LED outperforms Big Bird in accuracy and energy efficiency.
Increasing model size is more energy-efficient than increasing sequence length for summarization.
Smaller models are both more accurate and efficient in question answering.
Abstract
With many real-world applications of Natural Language Processing (NLP) comprising of long texts, there has been a rise in NLP benchmarks that measure the accuracy of models that can handle longer input sequences. However, these benchmarks do not consider the trade-offs between accuracy, speed, and power consumption as input sizes or model sizes are varied. In this work, we perform a systematic study of this accuracy vs. efficiency trade-off on two widely used long-sequence models - Longformer-Encoder-Decoder (LED) and Big Bird - during fine-tuning and inference on four datasets from the SCROLLS benchmark. To study how this trade-off differs across hyperparameter settings, we compare the models across four sequence lengths (1024, 2048, 3072, 4096) and two model sizes (base and large) under a fixed resource budget. We find that LED consistently achieves better accuracy at lower energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
