Horizontal and Vertical Attention in Transformers

Litao Yu; Jian Zhang

arXiv:2207.04399·cs.CV·July 12, 2022

Horizontal and Vertical Attention in Transformers

Litao Yu, Jian Zhang

PDF

Open Access

TL;DR

This paper introduces horizontal and vertical attention mechanisms to enhance feature representation in Transformers, improving their performance and generalization with minimal additional computational cost.

Contribution

It proposes modular horizontal and vertical attention modules that can be integrated into Transformers to improve feature re-weighting and channel-wise calibration.

Findings

01

Enhanced Transformer models show improved performance across tasks.

02

The proposed attentions require minimal additional computation.

03

The modules are highly modular and adaptable.

Abstract

Transformers are built upon multi-head scaled dot-product attention and positional encoding, which aim to learn the feature representations and token dependencies. In this work, we focus on enhancing the distinctive representation by learning to augment the feature maps with the self-attention mechanism in Transformers. Specifically, we propose the horizontal attention to re-weight the multi-head output of the scaled dot-product attention before dimensionality reduction, and propose the vertical attention to adaptively re-calibrate channel-wise feature responses by explicitly modelling inter-dependencies among different channels. We demonstrate the Transformer models equipped with the two attentions have a high generalization capability across different supervised learning tasks, with a very minor additional computational cost overhead. The proposed horizontal and vertical attentions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Human Pose and Action Recognition

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Layer Normalization