Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network

Jijia Liu; Feng Gao; Qingmin Liao; Chao Yu; Yu Wang

arXiv:2502.00288·cs.LG·May 30, 2025

Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network

Jijia Liu, Feng Gao, Qingmin Liao, Chao Yu, Yu Wang

PDF

Open Access 1 Video

TL;DR

The paper introduces Auto-Regressive Soft Q-learning (ARSQ), a novel value-based reinforcement learning algorithm that models Q-values hierarchically and autoregressively, improving sample efficiency and performance in continuous control tasks using suboptimal demonstration data.

Contribution

ARSQ models Q-values in a hierarchical, auto-regressive manner, addressing interdependencies in action dimensions and enhancing learning from suboptimal data in continuous control.

Findings

01

ARSQ achieves 1.62x performance improvement over SOTA on D4RL.

02

ARSQ surpasses baselines on RLBench with expert demonstrations.

03

Effective learning from suboptimal demonstration data.

Abstract

Reinforcement learning (RL) for continuous control often requires large amounts of online interaction data. Value-based RL methods can mitigate this burden by offering relatively high sample efficiency. Some studies further enhance sample efficiency by incorporating offline demonstration data to "kick-start" training, achieving promising results in continuous control. However, they typically compute the Q-function independently for each action dimension, neglecting interdependencies and making it harder to identify optimal actions when learning from suboptimal data, such as non-expert demonstration and online-collected data during the training process. To address these issues, we propose Auto-Regressive Soft Q-learning (ARSQ), a value-based RL algorithm that models Q-values in a coarse-to-fine, auto-regressive manner. First, ARSQ decomposes the continuous action space into discrete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network· slideslive

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM

MethodsQ-Learning