Expanding Expressivity in Transformer Models with M\"obiusAttention

Anna-Maria Halacheva; Mojtaba Nayyeri; Steffen Staab

arXiv:2409.12175·cs.LG·September 19, 2024

Expanding Expressivity in Transformer Models with M\"obiusAttention

Anna-Maria Halacheva, Mojtaba Nayyeri, Steffen Staab

PDF

Open Access

TL;DR

This paper introduces M"obiusAttention, a novel attention mechanism using M"obius transformations in complex space, enhancing the expressivity of Transformer models for NLP tasks.

Contribution

It proposes M"obiusAttention, integrating non-linear M"obius transformations into Transformers, and demonstrates improved performance with fewer parameters on NLP benchmarks.

Findings

01

M"obiusAttention outperforms baseline models on GLUE tasks.

02

Enhanced expressivity achieved with fewer parameters.

03

Models capture more intricate geometric relationships.

Abstract

Attention mechanisms and Transformer architectures have revolutionized Natural Language Processing (NLP) by enabling exceptional modeling of long-range dependencies and capturing intricate linguistic patterns. However, their inherent reliance on linear operations in the form of matrix multiplications limits their ability to fully capture inter-token relationships on their own. We propose M\"obiusAttention, a novel approach that integrates M\"obius transformations within the attention mechanism of Transformer-based models. M\"obius transformations are non-linear operations in spaces over complex numbers with the ability to map between various geometries. By incorporating these properties, M\"obiusAttention empowers models to learn more intricate geometric relationships between tokens and capture a wider range of information through complex-valued weight vectors. We build and pre-train a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks