Convolution-enhanced Evolving Attention Networks

Yujing Wang; Yaming Yang; Zhuo Li; Jiangang Bai; Mingliang Zhang,; Xiangtai Li; Jing Yu; Ce Zhang; Gao Huang; Yunhai Tong

arXiv:2212.08330·cs.LG·May 1, 2023·1 cites

Convolution-enhanced Evolving Attention Networks

Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang,, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

PDF

Open Access 1 Repo

TL;DR

This paper introduces a convolution-enhanced evolving attention mechanism that models the evolution of inter-token relationships across layers, leading to improved performance in various tasks including time-series analysis and natural language understanding.

Contribution

It proposes a novel residual convolutional module to explicitly model the layer-wise evolution of attention maps, enhancing information flow and performance.

Findings

01

Achieves 17% improvement on time-series tasks over SOTA.

02

First to explicitly model layer-wise evolution of attention maps.

03

Improves performance across multiple applications like NLP, vision, and translation.

Abstract

Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations , wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pkuyym/evolvingattention
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Time Series Analysis and Forecasting · Neural Networks and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Residual Connection · Label Smoothing · Adam