Bi-Directional Block Self-Attention for Fast and Memory-Efficient   Sequence Modeling

Tao Shen; Tianyi Zhou; Guodong Long; Jing Jiang; Chengqi Zhang

arXiv:1804.00857·cs.CL·April 4, 2018·78 cites

Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling

Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

PDF

Open Access 1 Repo

TL;DR

Bi-BloSAN is a novel sequence encoding model that combines intra- and inter-block self-attention to achieve RNN/CNN-like memory efficiency while maintaining SAN's advantages, excelling across NLP tasks.

Contribution

Introduces Bi-BloSAN, a bi-directional block self-attention network that reduces memory usage and improves efficiency in sequence modeling compared to existing SAN, RNN, and CNN models.

Findings

01

Achieves or surpasses state-of-the-art accuracy on nine NLP benchmarks.

02

Demonstrates better efficiency-memory trade-off than existing RNN, CNN, and SAN models.

03

Effectively models both local and long-range dependencies with reduced memory requirements.

Abstract

Recurrent neural networks (RNN), convolutional neural networks (CNN) and self-attention networks (SAN) are commonly used to produce context-aware representations. RNN can capture long-range dependency but is hard to parallelize and not time-efficient. CNN focuses on local dependency but does not perform well on some tasks. SAN can model both such dependencies via highly parallelizable computation, but memory requirement grows rapidly in line with sequence length. In this paper, we propose a model, called "bi-directional block self-attention network (Bi-BloSAN)", for RNN/CNN-free sequence encoding. It requires as little memory as RNN but with all the merits of SAN. Bi-BloSAN splits the entire sequence into blocks, and applies an intra-block SAN to each block for modeling local context, then applies an inter-block SAN to the outputs for all blocks to capture long-range dependency. Thus,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

taoshen58/BiBloSA
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies