Three-Phase Transformer

Mohammad R. Abu Ayyash

arXiv:2604.14430·cs.CL·April 17, 2026

Three-Phase Transformer

Mohammad R. Abu Ayyash

PDF

1 Repo

TL;DR

The paper introduces Three-Phase Transformer (3PT), a novel residual-stream structural prior for decoder-only Transformers that improves stability and performance by partitioning channels and injecting a fixed position profile.

Contribution

3PT presents a new architecture with phase-respecting operations, channel partitioning, and position injection, demonstrating improved perplexity and convergence speed on language modeling tasks.

Findings

01

Achieves -7.20% perplexity on WikiText-103 with 123M parameters.

02

Provides a self-stabilizing equilibrium architecture without explicit constraints.

03

Shows N=3 as an effective parameter sharing choice, with stability across different N values.

Abstract

We present Three-Phase Transformer (3PT), a residual-stream structural prior for decoder-only Transformers on a standard SwiGLU + RMSNorm + RoPE + GQA backbone. The hidden vector is partitioned into N equally-sized cyclic channels, each maintained by phase-respecting ops: a per-channel RMSNorm, a 2D Givens rotation between attention and FFN that rotates each channel by theta + i*(2*pi/N), and a head-count constraint aligning GQA heads with the partition. The architecture is a self-stabilizing equilibrium between scrambling and re-imposition, not a bolted-on module. The partition carves out a one-dimensional DC subspace orthogonal to the channels, into which we inject a fixed Gabriel's horn profile r(p) = 1/(p+1) as an absolute-position side-channel composing orthogonally with RoPE's relative-position rotation. The canonical N=3 borrows its metaphor from balanced three-phase AC, where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

achelousace/three-phase-transformer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.