ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention

Jintian Shao; Hongyi Huang; Jiayi Wu; Beiwen Zhang; ZhiYu Wu; You Shan; MingKai Zheng

arXiv:2505.10222·cs.LG·May 28, 2025

ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention

Jintian Shao, Hongyi Huang, Jiayi Wu, Beiwen Zhang, ZhiYu Wu, You Shan, MingKai Zheng

PDF

Open Access 1 Repo

TL;DR

ComplexFormer introduces a novel complex vector attention mechanism for transformers, enabling each head to independently model semantic and positional interactions, resulting in improved performance and efficiency across various NLP tasks.

Contribution

It proposes Complex Multi-Head Attention (CMHA) with head-specific complex vectors and adaptive rotations, enhancing transformer flexibility and expressiveness.

Findings

01

Achieves lower perplexity in language modeling.

02

Improves long-context coherence in text generation.

03

Demonstrates superior performance over RoPE-Transformers.

Abstract

Transformer models rely on self-attention to capture token dependencies but face challenges in effectively integrating positional information while allowing multi-head attention (MHA) flexibility. Prior methods often model semantic and positional differences disparately or apply uniform positional adjustments across heads, potentially limiting representational capacity. This paper introduces ComplexFormer, featuring Complex Multi-Head Attention-CMHA. CMHA empowers each head to independently model semantic and positional differences unified within the complex plane, representing interactions as rotations and scaling. ComplexFormer incorporates two key improvements: (1) a per-head Euler transformation, converting real-valued query/key projections into polar-form complex vectors for head-specific complex subspace operation; and (2) a per-head adaptive differential rotation mechanism,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shaojintian/power_law_decay
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Cognitive Science and Education Research

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention