Convolution, attention and structure embedding
Jean-Marc Andreoli

TL;DR
This paper presents a unified framework for analyzing neural network models involving structured embeddings, including convolution and attention mechanisms, highlighting their similarities and mutual benefits.
Contribution
It introduces a systematic framework that captures diverse structured models and demonstrates how attention can be viewed as adaptive convolution.
Findings
Unified analysis of convolution and attention models
Attention as adaptive convolution
Framework facilitates mutual model enhancement
Abstract
Deep neural networks are composed of layers of parametrised linear operations intertwined with non linear activations. In basic models, such as the multi-layer perceptron, a linear layer operates on a simple input vector embedding of the instance being processed, and produces an output vector embedding by straight multiplication by a matrix parameter. In more complex models, the input and output are structured and their embeddings are higher order tensors. The parameter of each linear operation must then be controlled so as not to explode with the complexity of the structures involved. This is essentially the role of convolution models, which exist in many flavours dependent on the type of structure they deal with (grids, networks, time series etc.). We present here a unified framework which aims at capturing the essence of these diverse models, allowing a systematic analysis of their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Graph Neural Networks · Neural Networks and Applications
MethodsLinear Layer · Convolution
