The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle
Dibakar Sigdel

TL;DR
The paper introduces the Phasor Transformer, a novel phase-based model that overcomes attention bottlenecks in long-sequence time-series modeling by leveraging geometry on the unit circle, achieving efficient global token mixing.
Contribution
It proposes the Phasor Transformer block using phase shifts and DFT for global mixing, enabling scalable, efficient time-series modeling without explicit attention mechanisms.
Findings
LPM achieves competitive forecasting with fewer parameters.
The model demonstrates stable global dynamics in synthetic benchmarks.
Efficient global token coupling is achieved via phase computation.
Abstract
Transformer models have redefined sequence learning, yet dot-product self-attention introduces a quadratic token-mixing bottleneck for long-context time-series. We introduce the \textbf{Phasor Transformer} block, a phase-native alternative representing sequence states on the unit-circle manifold . Each block combines lightweight trainable phase-shifts with parameter-free Discrete Fourier Transform (DFT) token coupling, achieving global mixing without explicit attention maps. Stacking these blocks defines the \textbf{Large Phasor Model (LPM)}. We validate LPM on autoregressive time-series prediction over synthetic multi-frequency benchmarks. Operating with a highly compact parameter budget, LPM learns stable global dynamics and achieves competitive forecasting behavior compared to conventional self-attention baselines. Our results establish an explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Model Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis
