Continuous-Depth Transformers with Learned Control Dynamics

Peter Jemley

arXiv:2601.10007·cs.LG·January 16, 2026

Continuous-Depth Transformers with Learned Control Dynamics

Peter Jemley

PDF

Open Access

TL;DR

This paper introduces a continuous-depth transformer architecture using Neural ODEs that allows for inference-time control over generated content, demonstrating stability, semantic steering, and efficiency.

Contribution

It proposes a hybrid transformer with a continuous-depth ODE block and learned control signals, enabling steerable and efficient language generation.

Findings

01

Achieves stable gradient flow with no exploding/vanishing gradients.

02

Demonstrates high accuracy in semantic steering for sentiment control.

03

Maintains latency comparable to standard discrete transformers.

Abstract

We present a hybrid transformer architecture that replaces discrete middle layers with a continuous-depth Neural Ordinary Differential Equation (ODE) block, enabling inference-time control over generation attributes via a learned steering signal. Unlike standard transformers that process representations through fixed discrete layers, our approach treats depth as a continuous variable governed by a learned vector field $F_{θ} (H, τ, u)$ , where $u$ is a low-dimensional control signal injected via explicit concatenation. We validate the architecture through four experiments: (1) gradient flow stability with zero exploding/vanishing gradient events, (2) semantic steering achieving 98\%/88\% accuracy for positive/negative sentiment control, (3) continuous interpolation validated by a negligible 0.068\% trajectory divergence between fixed and adaptive solvers, and (4) efficiency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Neural Networks and Reservoir Computing