
TL;DR
The paper introduces Graded Transformers, a new class of models embedding algebraic inductive biases through grading transformations, improving hierarchical learning and efficiency across diverse structured data applications.
Contribution
It proposes the Graded Transformer framework with novel architectures, theoretical guarantees, and adaptive grading, advancing hierarchical and neuro-symbolic reasoning in sequence models.
Findings
Universal approximation for continuous and Sobolev functions.
Reduced sample complexity via VC dimension bounds.
Enhanced robustness and gradient stability.
Abstract
We introduce the Graded Transformer framework, a new class of sequence models that embeds algebraic inductive biases through grading transformations on vector spaces. Extending Graded Neural Networks (GNNs), we propose two architectures: the Linearly Graded Transformer (LGT) and the Exponentially Graded Transformer (EGT). These models apply parameterized scaling operators, governed by fixed or learnable grading tuples and in the case of EGT exponential factors, to encode hierarchical structure in attention and representation layers and to improve efficiency for structured data. We establish rigorous guarantees, including universal approximation theorems for continuous and Sobolev functions, reduced sample complexity via effective VC dimension bounds, Lipschitz continuity of graded operations, and robustness to perturbations. A graded loss ensures gradient stability and alignment with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Advanced Graph Neural Networks · Ferroelectric and Negative Capacitance Devices
