BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang

TL;DR
BP-Transformer introduces a binary partitioning-based attention mechanism that efficiently models long-range dependencies in text, balancing computational complexity and performance for various NLP tasks.
Contribution
It proposes a novel binary partitioning attention mechanism that reduces complexity and improves long-text modeling in Transformer architectures.
Findings
Outperforms previous models on long text tasks
Balances computational efficiency with model capacity
Effective across text classification, translation, and language modeling
Abstract
The Transformer model is widely successful on many natural language processing tasks. However, the quadratic complexity of self-attention limit its application on long text. In this paper, adopting a fine-to-coarse attention mechanism on multi-scale spans via binary partitioning (BP), we propose BP-Transformer (BPT for short). BPT yields connections where is a hyperparameter to control the density of attention. BPT has a good balance between computation complexity and model capacity. A series of experiments on text classification, machine translation and language modeling shows BPT has a superior performance for long text than previous self-attention models. Our code, hyperparameters and CUDA kernels for sparse attention are available in PyTorch.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Graph Self-Attention · BP-Transformer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia?
