Block-Skim: Efficient Question Answering for Transformer

Yue Guan; Zhengyi Li; Jingwen Leng; Zhouhan Lin; Minyi Guo; Yuhao Zhu

arXiv:2112.08560·cs.CL·May 17, 2022

Block-Skim: Efficient Question Answering for Transformer

Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo, Yuhao Zhu

PDF

Open Access 1 Repo 1 Video

TL;DR

Block-Skim enhances transformer-based question answering by selectively pruning unnecessary context early in the model, leading to faster inference and improved accuracy.

Contribution

This paper introduces Block-Skim, a novel method that uses self-attention weights to identify and discard irrelevant context in higher layers of transformers for QA tasks.

Findings

01

Achieves 3x speedup on BERT-base during inference.

02

Outperforms full-size models in accuracy after pruning.

03

Effectively identifies essential context using self-attention weights.

Abstract

Transformer models have achieved promising results on natural language processing (NLP) tasks including extractive question answering (QA). Common Transformer encoders used in NLP tasks process the hidden states of all input tokens in the context paragraph throughout all layers. However, different from other tasks such as sequence classification, answering the raised question does not necessarily need all the tokens in the context paragraph. Following this motivation, we propose Block-skim, which learns to skim unnecessary context in higher hidden layers to improve and accelerate the Transformer performance. The key idea of Block-Skim is to identify the context that must be further processed and those that could be safely discarded early on during inference. Critically, we find that such information could be sufficiently derived from the self-attention weights inside the Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chandlerguan/blockskim
pytorchOfficial

Videos

Block-Skim: Efficient Question Answering for Transformer· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Absolute Position Encodings · Residual Connection · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections