Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network
Jijia Liu, Feng Gao, Qingmin Liao, Chao Yu, Yu Wang

TL;DR
The paper introduces Auto-Regressive Soft Q-learning (ARSQ), a novel value-based reinforcement learning algorithm that models Q-values hierarchically and autoregressively, improving sample efficiency and performance in continuous control tasks using suboptimal demonstration data.
Contribution
ARSQ models Q-values in a hierarchical, auto-regressive manner, addressing interdependencies in action dimensions and enhancing learning from suboptimal data in continuous control.
Findings
ARSQ achieves 1.62x performance improvement over SOTA on D4RL.
ARSQ surpasses baselines on RLBench with expert demonstrations.
Effective learning from suboptimal demonstration data.
Abstract
Reinforcement learning (RL) for continuous control often requires large amounts of online interaction data. Value-based RL methods can mitigate this burden by offering relatively high sample efficiency. Some studies further enhance sample efficiency by incorporating offline demonstration data to "kick-start" training, achieving promising results in continuous control. However, they typically compute the Q-function independently for each action dimension, neglecting interdependencies and making it harder to identify optimal actions when learning from suboptimal data, such as non-expert demonstration and online-collected data during the training process. To address these issues, we propose Auto-Regressive Soft Q-learning (ARSQ), a value-based RL algorithm that models Q-values in a coarse-to-fine, auto-regressive manner. First, ARSQ decomposes the continuous action space into discrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM
MethodsQ-Learning
