ConvBERT: Improving BERT with Span-based Dynamic Convolution
Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng,, Shuicheng Yan

TL;DR
ConvBERT introduces span-based dynamic convolution to replace some self-attention heads in BERT, reducing memory and computation costs while improving performance on NLP tasks.
Contribution
The paper proposes a novel mixed attention mechanism combining dynamic convolution with self-attention, enhancing efficiency and effectiveness of BERT.
Findings
ConvBERT outperforms BERT and variants on multiple NLP benchmarks.
It achieves higher GLUE scores with fewer parameters.
Training cost is significantly reduced compared to ELECTRA.
Abstract
Pre-trained language models like BERT and its variants have recently achieved impressive performance in various natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost. Although all its attention heads query on the whole input sequence for generating the attention map from a global perspective, we observe some heads only need to learn local dependencies, which means the existence of computation redundancy. We therefore propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning. We equip BERT with this mixed attention design and build a ConvBERT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Finnish-NLP/convbert-base-finnishmodel· 8 dl· ♡ 28 dl♡ 2
- 🤗Finnish-NLP/convbert-base-generator-finnishmodel· 3 dl3 dl
- 🤗akdeniz27/convbert-base-turkish-cased-nermodel· 28 dl· ♡ 328 dl♡ 3
- 🤗dbmdz/convbert-base-turkish-casedmodel· 298 dl· ♡ 4298 dl♡ 4
- 🤗mrm8488/convbert-base-spanishmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗mrm8488/convbert-small-spanishmodel· 4 dl· ♡ 34 dl♡ 3
- 🤗sarnikowski/convbert-medium-small-da-casedmodel· 4 dl4 dl
- 🤗sarnikowski/convbert-small-da-casedmodel· 3 dl3 dl
- 🤗OWG/convbert-base-spanishmodel
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Dynamic Convolution · Span-Based Dynamic Convolution · Mixed Attention Block · ConvBERT · WordPiece · Dense Connections · Residual Connection · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia?
