A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization

Muhammed Ustaomeroglu; Guannan Qu

arXiv:2506.06179·cs.LG·June 9, 2025

A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization

Muhammed Ustaomeroglu, Guannan Qu

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical analysis of self-attention mechanisms, showing they effectively model pairwise interactions and generalize well, while also introducing new modules for learning complex multi-entity dependencies.

Contribution

It offers a theoretical framework for understanding self-attention as an interaction learner and introduces HyperFeatureAttention and HyperAttention modules for complex interaction modeling.

Findings

01

Self-attention can represent and learn pairwise interaction functions.

02

Self-attention generalizes across distributions and out-of-distribution scenarios.

03

Proposed modules capture multi-entity interactions beyond pairwise.

Abstract

Self-attention has emerged as a core component of modern neural architectures, yet its theoretical underpinnings remain elusive. In this paper, we study self-attention through the lens of interacting entities, ranging from agents in multi-agent reinforcement learning to alleles in genetic sequences, and show that a single layer linear self-attention can efficiently represent, learn, and generalize functions capturing pairwise interactions, including out-of-distribution scenarios. Our analysis reveals that self-attention acts as a mutual interaction learner under minimal assumptions on the diversity of interaction patterns observed during training, thereby encompassing a wide variety of real-world domains. In addition, we validate our theoretical insights through experiments demonstrating that self-attention learns interaction functions and generalizes across both population…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)