Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures
Joshua Nunley

TL;DR
This paper introduces a unified framework for sequence models like RNNs and Transformers based on subgroups of U(d), enabling flexible architectures and demonstrating improved performance with orthogonal-state models on standard benchmarks.
Contribution
It develops a minimal axiomatic approach to derive RNN and Transformer architectures from subgroup choices, providing a versatile template for designing sequence models.
Findings
Orthogonal-state RNNs and Transformers perform well on Tiny Shakespeare and Penn Treebank.
A linear-mixing extension in tangent space enhances performance across subgroup choices.
The framework allows for flexible, subgroup-based sequence model design.
Abstract
This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks
