Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

Joshua Nunley

arXiv:2602.18417·cs.LG·February 23, 2026

Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

Joshua Nunley

PDF

Open Access

TL;DR

This paper introduces a unified framework for sequence models like RNNs and Transformers based on subgroups of U(d), enabling flexible architectures and demonstrating improved performance with orthogonal-state models on standard benchmarks.

Contribution

It develops a minimal axiomatic approach to derive RNN and Transformer architectures from subgroup choices, providing a versatile template for designing sequence models.

Findings

01

Orthogonal-state RNNs and Transformers perform well on Tiny Shakespeare and Penn Treebank.

02

A linear-mixing extension in tangent space enhances performance across subgroup choices.

03

The framework allows for flexible, subgroup-based sequence model design.

Abstract

This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks