
TL;DR
Rational Transductors extend Transformers with rational state information, enabling them to recognize all regular languages and solve complex problems efficiently, thus overcoming traditional limitations in length generalization and logical reasoning.
Contribution
The paper introduces Rational Transductors, a novel dual-stream architecture that enhances Transformers with WFA-derived rational state information, significantly broadening their expressive power.
Findings
Achieves strict generalization to all regular languages and NC^1 problems.
Maintains O(L + log T) parallel complexity, enabling efficient computation.
Demonstrates improved length generalization on algorithmic tasks.
Abstract
Standard Transformers excel at semantic modeling but struggle with rigid sequential logic and state tracking. Theoretical work establishes that self-attention is limited to (under hard attention) or (under soft attention), complexity classes that often fail to support robust length generalization on sequential problems without intermediate chain-of-thought. In this work, we introduce \emph{Rational Transductors}, a dual-stream architecture that augments the Transformer with a matrix-valued recurrence derived from Weighted Finite Automata (WFA). By injecting rational state information into the attention mechanism via a \emph{Deep Rational Injection} scheme, our framework strictly generalizes the expressive power of Transformers to capture all Regular Languages, -complete problems (such as Boolean Formula Evaluation), and fundamental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Formal Methods in Verification · Complexity and Algorithms in Graphs
