Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Reinforcement Learning
Younggyo Seo, Pieter Abbeel

TL;DR
This paper introduces CQN-AS, a new reinforcement learning algorithm that predicts sequences of actions to improve data efficiency, demonstrating superior performance on complex control tasks.
Contribution
The paper proposes CQN-AS, a value-based RL method that explicitly learns Q-values over action sequences, enhancing data efficiency in sparse-reward environments.
Findings
CQN-AS outperforms baselines on humanoid control tasks.
CQN-AS achieves better sample efficiency in manipulation tasks.
Action sequence prediction improves RL performance.
Abstract
Predicting a sequence of actions has been crucial in the success of recent behavior cloning algorithms in robotics. Can similar ideas improve reinforcement learning (RL)? We answer affirmatively by observing that incorporating action sequences when predicting ground-truth return-to-go leads to lower validation loss. Motivated by this, we introduce Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a novel value-based RL algorithm that learns a critic network that outputs Q-values over a sequence of actions, i.e., explicitly training the value function to learn the consequence of executing action sequences. Our experiments show that CQN-AS outperforms several baselines on a variety of sparse-reward humanoid control and tabletop manipulation tasks from BiGym and RLBench.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
