Continuous-Depth Transformers with Learned Control Dynamics
Peter Jemley

TL;DR
This paper introduces a continuous-depth transformer architecture using Neural ODEs that allows for inference-time control over generated content, demonstrating stability, semantic steering, and efficiency.
Contribution
It proposes a hybrid transformer with a continuous-depth ODE block and learned control signals, enabling steerable and efficient language generation.
Findings
Achieves stable gradient flow with no exploding/vanishing gradients.
Demonstrates high accuracy in semantic steering for sentiment control.
Maintains latency comparable to standard discrete transformers.
Abstract
We present a hybrid transformer architecture that replaces discrete middle layers with a continuous-depth Neural Ordinary Differential Equation (ODE) block, enabling inference-time control over generation attributes via a learned steering signal. Unlike standard transformers that process representations through fixed discrete layers, our approach treats depth as a continuous variable governed by a learned vector field , where is a low-dimensional control signal injected via explicit concatenation. We validate the architecture through four experiments: (1) gradient flow stability with zero exploding/vanishing gradient events, (2) semantic steering achieving 98\%/88\% accuracy for positive/negative sentiment control, (3) continuous interpolation validated by a negligible 0.068\% trajectory divergence between fixed and adaptive solvers, and (4) efficiency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Neural Networks and Reservoir Computing
