Relational inductive biases on attention mechanisms

V\'ictor Mijangos; Ximena Gutierrez-Vasques; Ver\'onica E. Arriola; Ulises Rodr\'iguez-Dom\'inguez; Alexis Cervantes; Jos\'e Luis Almanzara

arXiv:2507.04117·cs.LG·July 8, 2025

Relational inductive biases on attention mechanisms

V\'ictor Mijangos, Ximena Gutierrez-Vasques, Ver\'onica E. Arriola, Ulises Rodr\'iguez-Dom\'inguez, Alexis Cervantes, Jos\'e Luis Almanzara

PDF

TL;DR

This paper characterizes the relational inductive biases in attention mechanisms by analyzing their equivariance properties, providing a classification based on the underlying relationships they assume on data, advancing understanding in geometric deep learning.

Contribution

It offers a novel classification of attention mechanisms based on their relational biases and equivariance properties, linking geometric deep learning concepts to attention models.

Findings

01

Different attention layers are characterized by their assumed relationships.

02

A classification of attention mechanisms based on relational biases.

03

Insights into the equivariance properties of attention mechanisms.

Abstract

Inductive learning aims to construct general models from specific examples, guided by biases that influence hypothesis selection and determine generalization capacity. In this work, we focus on characterizing the relational inductive biases present in attention mechanisms, understood as assumptions about the underlying relationships between data elements. From the perspective of geometric deep learning, we analyze the most common attention mechanisms in terms of their equivariance properties with respect to permutation subgroups, which allows us to propose a classification based on their relational biases. Under this perspective, we show that different attention layers are characterized by the underlying relationships they assume on the input data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.