Pyramid-BERT: Reducing Complexity via Successive Core-set based Token   Selection

Xin Huang; Ashish Khetan; Rene Bidart; Zohar Karnin

arXiv:2203.14380·cs.CL·March 29, 2022

Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection

Xin Huang, Ashish Khetan, Rene Bidart, Zohar Karnin

PDF

Open Access

TL;DR

Pyramid-BERT introduces a core-set based token selection method to reduce the computational complexity of BERT, enabling efficient processing of longer sequences without pre-training, and demonstrates superior performance on NLP benchmarks.

Contribution

The paper proposes a novel core-set based token selection technique for BERT, replacing heuristics, to improve efficiency and handle longer sequences effectively.

Findings

01

Outperforms baselines on GLUE benchmarks

02

Achieves better results on Long Range Arena datasets

03

Enables longer sequence processing without pre-training

Abstract

Transformer-based language models such as BERT have achieved the state-of-the-art performance on various NLP tasks, but are computationally prohibitive. A recent line of works use various heuristics to successively shorten sequence length while transforming tokens through encoders, in tasks such as classification and ranking that require a single token embedding for prediction. We present a novel solution to this problem, called Pyramid-BERT where we replace previously used heuristics with a {\em core-set} based token selection method justified by theoretical results. The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths. We provide extensive experiments establishing advantages of pyramid BERT over several baselines and existing works on the GLUE benchmarks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Attention Dropout · Linear Warmup With Linear Decay · Layer Normalization · Weight Decay