ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention
Jintian Shao, Hongyi Huang, Jiayi Wu, Beiwen Zhang, ZhiYu Wu, You Shan, MingKai Zheng

TL;DR
ComplexFormer introduces a novel complex vector attention mechanism for transformers, enabling each head to independently model semantic and positional interactions, resulting in improved performance and efficiency across various NLP tasks.
Contribution
It proposes Complex Multi-Head Attention (CMHA) with head-specific complex vectors and adaptive rotations, enhancing transformer flexibility and expressiveness.
Findings
Achieves lower perplexity in language modeling.
Improves long-context coherence in text generation.
Demonstrates superior performance over RoPE-Transformers.
Abstract
Transformer models rely on self-attention to capture token dependencies but face challenges in effectively integrating positional information while allowing multi-head attention (MHA) flexibility. Prior methods often model semantic and positional differences disparately or apply uniform positional adjustments across heads, potentially limiting representational capacity. This paper introduces ComplexFormer, featuring Complex Multi-Head Attention-CMHA. CMHA empowers each head to independently model semantic and positional differences unified within the complex plane, representing interactions as rotations and scaling. ComplexFormer incorporates two key improvements: (1) a per-head Euler transformation, converting real-valued query/key projections into polar-form complex vectors for head-specific complex subspace operation; and (2) a per-head adaptive differential rotation mechanism,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Cognitive Science and Education Research
MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention
