Capturing Multi-Resolution Context by Dilated Self-Attention

Niko Moritz; Takaaki Hori; Jonathan Le Roux

arXiv:2104.02858·eess.AS·April 8, 2021

Capturing Multi-Resolution Context by Dilated Self-Attention

Niko Moritz, Takaaki Hori, Jonathan Le Roux

PDF

TL;DR

This paper introduces dilated self-attention, combining restricted attention and dilation to efficiently capture multi-resolution context in long sequences, significantly reducing computational costs while maintaining high performance.

Contribution

It proposes a novel dilated self-attention mechanism that balances local and distant information capture with lower computational complexity.

Findings

01

Achieves ASR performance comparable to full self-attention.

02

Reduces computational costs significantly.

03

Demonstrates effectiveness of dilation methods like pooling and subsampling.

Abstract

Self-attention has become an important and widely used neural network component that helped to establish new state-of-the-art results for various applications, such as machine translation and automatic speech recognition (ASR). However, the computational complexity of self-attention grows quadratically with the input sequence length. This can be particularly problematic for applications such as ASR, where an input sequence generated from an utterance can be relatively long. In this work, we propose a combination of restricted self-attention and a dilation mechanism, which we refer to as dilated self-attention. The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution. Different methods for summarizing distant frames are studied, such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.