E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models
Mohammad Akbari, Amin Banitalebi-Dehkordi, Yong Zhang

TL;DR
E-LANG introduces a dynamic inference method that efficiently combines large and lightweight language models based on energy-based decision making, improving speed and maintaining performance across various NLP tasks.
Contribution
It presents a novel, architecture-agnostic approach for joint inference that works with black-box models and sequence-to-sequence tasks, outperforming existing methods in efficiency.
Findings
Outperforms T5-11B with 3.3x speed-up on GLUE
Achieves BERT SOTA with 3.2x less computation
Works for encoder-decoder and sequence-to-sequence tasks
Abstract
Building huge and highly capable language models has been a trend in the past years. Despite their great performance, they incur high computational cost. A common solution is to apply model compression or choose light-weight architectures, which often need a separate fixed-size model for each desirable computational budget, and may lose performance in case of heavy compression. This paper proposes an effective dynamic inference approach, called E-LANG, which distributes the inference between large accurate Super-models and light-weight Swift models. To this end, a decision making module routes the inputs to Super or Swift models based on the energy characteristics of the representations in the latent space. This method is easily adoptable and architecture agnostic. As such, it can be applied to black-box pre-trained models without a need for architectural manipulations, reassembling of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Residual Connection · Layer Normalization · Dropout · Adam · Dense Connections
