Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning

Zhancun Mu; Guangyu Zhao; Yiwu Zhong; Chi Zhang

arXiv:2604.22229·cs.LG·April 27, 2026

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning

Zhancun Mu, Guangyu Zhao, Yiwu Zhong, Chi Zhang

PDF

1 Repo

TL;DR

DROL introduces a dynamic routing method for offline reinforcement learning, enabling one-step actors to locally improve actions while maintaining efficient inference, outperforming some existing methods on benchmark tasks.

Contribution

The paper proposes DROL, a novel latent-conditioned one-step actor with top-1 dynamic routing, allowing local action improvements without sacrificing inference efficiency.

Findings

01

DROL performs competitively on OGBench and D4RL benchmarks.

02

It improves many OGBench task groups compared to baseline methods.

03

DROL remains effective on AntMaze and Adroit environments.

Abstract

One-step offline RL actors are attractive because they avoid backpropagating through long iterative samplers and keep inference cheap, but they still have to improve under a critic without drifting away from actions that the dataset can support. In recent one-step extraction pipelines, a strong iterative teacher provides one target action for each latent draw, and the same student output is asked to do both jobs: move toward higher Q and stay near that paired endpoint. If those two directions disagree, the loss resolves them as a compromise on that same sample, even when a nearby better action remains locally supported by the data. We propose DROL, a latent-conditioned one-step actor trained with top-1 dynamic routing. For each state, the actor samples $K$ candidate actions from a bounded latent prior, assigns each dataset action to its nearest candidate, and updates only that winner…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://muzhancun.github.io/preprints/DROL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.