
TL;DR
The paper introduces the Phase-Coherent Transformer (PCT), a novel complex-valued attention mechanism that preserves phase information and outperforms standard softmax Transformers across various benchmarks.
Contribution
It proposes a new phase-coherent attention method that replaces token competition with token-non-competing attention, enhancing generalization and performance in complex-valued Transformers.
Findings
PCT outperforms standard softmax Transformers on multiple benchmarks.
Gates preserving phase coherence are crucial for long-range retrieval tasks.
PCT maintains accuracy across various depths without collapse.
Abstract
Complex-valued Transformers have largely inherited softmax attention from real-valued architectures. However, row-normalised token competition is not necessarily aligned with phase-preserving computation. In this paper, we introduce the Phase-Coherent Transformer (PCT), which applies a real-valued, element-independent, smooth gate to L2-normalised complex query-key similarities. PCT replaces token competition with token-non-competing attention and is designed to preserve phase information across layers. Across mid-scale benchmarks spanning long-range memory, hierarchical long-range reasoning, positional retrieval, phase-based memory and superposition, and image classification, PCT shows strong generalisation across task categories. Under parameter-fair comparison, PCT consistently outperforms both the standard softmax Transformer and its direct complex-valued counterpart. Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
