Expanding Expressivity in Transformer Models with M\"obiusAttention
Anna-Maria Halacheva, Mojtaba Nayyeri, Steffen Staab

TL;DR
This paper introduces M"obiusAttention, a novel attention mechanism using M"obius transformations in complex space, enhancing the expressivity of Transformer models for NLP tasks.
Contribution
It proposes M"obiusAttention, integrating non-linear M"obius transformations into Transformers, and demonstrates improved performance with fewer parameters on NLP benchmarks.
Findings
M"obiusAttention outperforms baseline models on GLUE tasks.
Enhanced expressivity achieved with fewer parameters.
Models capture more intricate geometric relationships.
Abstract
Attention mechanisms and Transformer architectures have revolutionized Natural Language Processing (NLP) by enabling exceptional modeling of long-range dependencies and capturing intricate linguistic patterns. However, their inherent reliance on linear operations in the form of matrix multiplications limits their ability to fully capture inter-token relationships on their own. We propose M\"obiusAttention, a novel approach that integrates M\"obius transformations within the attention mechanism of Transformer-based models. M\"obius transformations are non-linear operations in spaces over complex numbers with the ability to map between various geometries. By incorporating these properties, M\"obiusAttention empowers models to learn more intricate geometric relationships between tokens and capture a wider range of information through complex-valued weight vectors. We build and pre-train a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks
