Dynamic Latent Routing

Fangyuan Yu; Xin Su; Amir Abdullah

arXiv:2605.14323·cs.LG·May 15, 2026

Dynamic Latent Routing

Fangyuan Yu, Xin Su, Amir Abdullah

PDF

TL;DR

This paper introduces Dynamic Latent Routing (DLR), a novel method for learning structured routing behaviors in models, outperforming prior baselines in low-data fine-tuning scenarios across multiple datasets.

Contribution

DLR is a new language-model post-training approach that jointly learns discrete latent codes, routing policies, and model parameters via dynamic search in a single stage.

Findings

01

DLR matches or outperforms supervised fine-tuning in low-data settings.

02

DLR achieves a mean gain of +6.6 percentage points across datasets.

03

Mechanistic analyses show DLR learns structured routing behaviors.

Abstract

We investigate the temporal concatenation of sub-policies in Markov Decision Processes (MDP) with time-varying reward functions. We introduce General Dijkstra Search (GDS), and prove that globally optimal goal-reaching policies can be recovered through temporal composition of intermediate optimal sub-policies. Motivated by the "search, select, update" principle underlying GDS, we propose Dynamic Latent Routing (DLR), a language-model post-training method that jointly learns discrete latent codes, routing policies, and model parameters through dynamic search in a single training stage. In low-data fine-tuning settings, DLR matches or outperforms supervised fine-tuning across four datasets and six models, achieving a mean gain of +6.6 percentage points, while prior discrete-latent baselines consistently underperform SFT. Mechanistic analyses and targeted code ablations show that DLR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.