Learning Dynamic BERT via Trainable Gate Variables and a Bi-modal   Regularizer

Seohyeong Jeong; Nojun Kwak

arXiv:2102.09727·cs.CL·February 22, 2021

Learning Dynamic BERT via Trainable Gate Variables and a Bi-modal Regularizer

Seohyeong Jeong, Nojun Kwak

PDF

Open Access

TL;DR

This paper introduces a dynamic BERT inference approach using trainable gate variables and a bi-modal regularizer, reducing computational costs while maintaining performance, suitable for resource-limited devices.

Contribution

It proposes a novel dynamic inference method for BERT with trainable gates and a bi-modal regularizer, enabling adjustable trade-offs between accuracy and efficiency.

Findings

01

Reduced computational cost on GLUE dataset

02

Minimal performance drop with dynamic inference

03

Model adjusts performance-cost trade-off via hyperparameter

Abstract

The BERT model has shown significant success on various natural language processing tasks. However, due to the heavy model size and high computational cost, the model suffers from high latency, which is fatal to its deployments on resource-limited devices. To tackle this problem, we propose a dynamic inference method on BERT via trainable gate variables applied on input tokens and a regularizer that has a bi-modal property. Our method shows reduced computational cost on the GLUE dataset with a minimal performance drop. Moreover, the model adjusts with a trade-off between performance and computational cost with the user-specified hyperparameter.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Linear Warmup With Linear Decay · Softmax · Adam · Multi-Head Attention · Attention Dropout · Weight Decay · Residual Connection · Attention Is All You Need · Dropout