Relaxed Attention for Transformer Models

Timo Lohrenz; Bj\"orn M\"oller; Zhengyang Li; Tim Fingscheidt

arXiv:2209.09735·cs.LG·September 21, 2022

Relaxed Attention for Transformer Models

Timo Lohrenz, Bj\"orn M\"oller, Zhengyang Li, Tim Fingscheidt

PDF

Open Access 1 Repo

TL;DR

This paper introduces relaxed attention, a simple smoothing technique for transformer models that improves regularization, enhances external language model integration, and achieves state-of-the-art results in lip-reading and machine translation tasks.

Contribution

It proposes relaxed attention as a novel method to regularize transformers and facilitate external language model integration, leading to improved performance.

Findings

01

Achieved 26.31% WER on LRS3 lip-reading benchmark.

02

Attained 37.67 BLEU score on IWSLT14 translation task.

03

Demonstrated improved regularization and external LM support.

Abstract

The powerful modeling capabilities of all-attention-based transformer architectures often cause overfitting and - for natural language processing tasks - lead to an implicitly learned internal language model in the autoregressive transformer decoder complicating the integration of external language models. In this paper, we explore relaxed attention, a simple and easy-to-implement smoothing of the attention weights, yielding a two-fold improvement to the general transformer architecture: First, relaxed attention provides regularization when applied to the self-attention layers in the encoder. Second, we show that it naturally supports the integration of an external language model as it suppresses the implicitly learned internal language model by relaxing the cross attention in the decoder. We demonstrate the benefit of relaxed attention across several tasks with clear improvement in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Oguzhanercan/Vision-Transformers
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing