Tensorized Self-Attention: Efficiently Modeling Pairwise and Global   Dependencies Together

Tao Shen; Tianyi Zhou; Guodong Long; Jing Jiang; Chengqi Zhang

arXiv:1805.00912·cs.CL·March 27, 2019·5 cites

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

PDF

Open Access 2 Repos

TL;DR

This paper introduces MTSA, a novel self-attention mechanism that efficiently models pairwise and global dependencies, outperforming previous models while maintaining low memory and computational costs.

Contribution

The paper proposes MTSA, a tensorized self-attention method that captures diverse dependencies with improved efficiency and expressiveness compared to existing models.

Findings

01

Achieves state-of-the-art or competitive results on nine NLP benchmarks.

02

Demonstrates significant memory and time efficiency over traditional models.

03

Effectively models both local and long-range dependencies in NLP tasks.

Abstract

Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using a vector to measure pairwise dependency, but this requires to expand the alignment matrix to a tensor, which results in memory and computation bottlenecks. In this paper, we propose a novel attention mechanism called "Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token) and global (source2token) dependencies by a novel compatibility function composed of dot-product and additive attentions, 2) uses a tensor to represent the feature-wise alignment scores for better expressive power but only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications