
TL;DR
This paper introduces an attention-free neural architecture based on Grassmann flows, which models token interactions through geometric subspaces, achieving competitive language modeling and inference performance with potentially more interpretable reasoning.
Contribution
The authors propose a novel Grassmann flow-based architecture that replaces self-attention with geometric subspace manipulations, offering a structured alternative to tensor lifting in sequence modeling.
Findings
Grassmann models achieve near-transformer perplexities on Wikitext-2.
On SNLI, Grassmann models slightly outperform Transformer heads.
Linear scaling of complexity with sequence length for fixed rank.
Abstract
We revisit a basic question in sequence modeling: is explicit self-attention actually necessary for strong performance and reasoning? We argue that standard multi-head attention is best seen as a form of tensor lifting: hidden vectors are mapped into a high-dimensional space of pairwise interactions, and learning proceeds by constraining this lifted tensor through gradient descent. This mechanism is extremely expressive but mathematically opaque, because after many layers it becomes very hard to describe the model with a small family of explicit invariants. To explore an alternative, we propose an attention-free architecture based on Grassmann flows. Instead of forming an L by L attention matrix, our Causal Grassmann layer (i) linearly reduces token states, (ii) encodes local token pairs as two-dimensional subspaces on a Grassmann manifold via Plucker coordinates, and (iii) fuses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Generative Adversarial Networks and Image Synthesis
