E-LANG: Energy-Based Joint Inferencing of Super and Swift Language   Models

Mohammad Akbari; Amin Banitalebi-Dehkordi; Yong Zhang

arXiv:2203.00748·cs.CL·March 3, 2022

E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models

Mohammad Akbari, Amin Banitalebi-Dehkordi, Yong Zhang

PDF

Open Access

TL;DR

E-LANG introduces a dynamic inference method that efficiently combines large and lightweight language models based on energy-based decision making, improving speed and maintaining performance across various NLP tasks.

Contribution

It presents a novel, architecture-agnostic approach for joint inference that works with black-box models and sequence-to-sequence tasks, outperforming existing methods in efficiency.

Findings

01

Outperforms T5-11B with 3.3x speed-up on GLUE

02

Achieves BERT SOTA with 3.2x less computation

03

Works for encoder-decoder and sequence-to-sequence tasks

Abstract

Building huge and highly capable language models has been a trend in the past years. Despite their great performance, they incur high computational cost. A common solution is to apply model compression or choose light-weight architectures, which often need a separate fixed-size model for each desirable computational budget, and may lose performance in case of heavy compression. This paper proposes an effective dynamic inference approach, called E-LANG, which distributes the inference between large accurate Super-models and light-weight Swift models. To this end, a decision making module routes the inputs to Super or Swift models based on the energy characteristics of the representations in the latent space. This method is easily adoptable and architecture agnostic. As such, it can be applied to black-box pre-trained models without a need for architectural manipulations, reassembling of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Residual Connection · Layer Normalization · Dropout · Adam · Dense Connections