Convolution-enhanced Evolving Attention Networks
Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang,, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

TL;DR
This paper introduces a convolution-enhanced evolving attention mechanism that models the evolution of inter-token relationships across layers, leading to improved performance in various tasks including time-series analysis and natural language understanding.
Contribution
It proposes a novel residual convolutional module to explicitly model the layer-wise evolution of attention maps, enhancing information flow and performance.
Findings
Achieves 17% improvement on time-series tasks over SOTA.
First to explicitly model layer-wise evolution of attention maps.
Improves performance across multiple applications like NLP, vision, and translation.
Abstract
Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations , wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Time Series Analysis and Forecasting · Neural Networks and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Residual Connection · Label Smoothing · Adam
