Composing Linear Layers from Irreducibles
Travis Pence, Daisuke Yamada, Vikas Singh

TL;DR
This paper introduces a method to decompose linear layers in large models into compositions of geometric primitives called rotors, providing a more efficient and interpretable algebraic structure that matches standard performance.
Contribution
We propose a novel algebraic decomposition of linear layers using Clifford algebra, reducing parameters and revealing geometric primitives underlying deep model functions.
Findings
Rotor-based layers match baseline performance in LLM attention modules.
Decomposition uses only O(log^2 d) parameters, significantly fewer than dense matrices.
Provides an algebraic perspective on the composition of functions in deep models.
Abstract
Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors -- geometric objects encoding oriented planes -- and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopological and Geometric Data Analysis · Data Visualization and Analytics · Algebraic and Geometric Analysis
