A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization
Muhammed Ustaomeroglu, Guannan Qu

TL;DR
This paper provides a theoretical analysis of self-attention mechanisms, showing they effectively model pairwise interactions and generalize well, while also introducing new modules for learning complex multi-entity dependencies.
Contribution
It offers a theoretical framework for understanding self-attention as an interaction learner and introduces HyperFeatureAttention and HyperAttention modules for complex interaction modeling.
Findings
Self-attention can represent and learn pairwise interaction functions.
Self-attention generalizes across distributions and out-of-distribution scenarios.
Proposed modules capture multi-entity interactions beyond pairwise.
Abstract
Self-attention has emerged as a core component of modern neural architectures, yet its theoretical underpinnings remain elusive. In this paper, we study self-attention through the lens of interacting entities, ranging from agents in multi-agent reinforcement learning to alleles in genetic sequences, and show that a single layer linear self-attention can efficiently represent, learn, and generalize functions capturing pairwise interactions, including out-of-distribution scenarios. Our analysis reveals that self-attention acts as a mutual interaction learner under minimal assumptions on the diversity of interaction patterns observed during training, thereby encompassing a wide variety of real-world domains. In addition, we validate our theoretical insights through experiments demonstrating that self-attention learns interaction functions and generalizes across both population…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
