Convolution, attention and structure embedding

Jean-Marc Andreoli

arXiv:1905.01289·cs.LG·March 6, 2020·19 cites

Convolution, attention and structure embedding

Jean-Marc Andreoli

PDF

Open Access

TL;DR

This paper presents a unified framework for analyzing neural network models involving structured embeddings, including convolution and attention mechanisms, highlighting their similarities and mutual benefits.

Contribution

It introduces a systematic framework that captures diverse structured models and demonstrates how attention can be viewed as adaptive convolution.

Findings

01

Unified analysis of convolution and attention models

02

Attention as adaptive convolution

03

Framework facilitates mutual model enhancement

Abstract

Deep neural networks are composed of layers of parametrised linear operations intertwined with non linear activations. In basic models, such as the multi-layer perceptron, a linear layer operates on a simple input vector embedding of the instance being processed, and produces an output vector embedding by straight multiplication by a matrix parameter. In more complex models, the input and output are structured and their embeddings are higher order tensors. The parameter of each linear operation must then be controlled so as not to explode with the complexity of the structures involved. This is essentially the role of convolution models, which exist in many flavours dependent on the type of structure they deal with (grids, networks, time series etc.). We present here a unified framework which aims at capturing the essence of these diverse models, allowing a systematic analysis of their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Graph Neural Networks · Neural Networks and Applications

MethodsLinear Layer · Convolution