Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Reinforcement Learning

Younggyo Seo; Pieter Abbeel

arXiv:2411.12155·cs.LG·November 18, 2025

Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Reinforcement Learning

Younggyo Seo, Pieter Abbeel

PDF

Open Access

TL;DR

This paper introduces CQN-AS, a new reinforcement learning algorithm that predicts sequences of actions to improve data efficiency, demonstrating superior performance on complex control tasks.

Contribution

The paper proposes CQN-AS, a value-based RL method that explicitly learns Q-values over action sequences, enhancing data efficiency in sparse-reward environments.

Findings

01

CQN-AS outperforms baselines on humanoid control tasks.

02

CQN-AS achieves better sample efficiency in manipulation tasks.

03

Action sequence prediction improves RL performance.

Abstract

Predicting a sequence of actions has been crucial in the success of recent behavior cloning algorithms in robotics. Can similar ideas improve reinforcement learning (RL)? We answer affirmatively by observing that incorporating action sequences when predicting ground-truth return-to-go leads to lower validation loss. Motivated by this, we introduce Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a novel value-based RL algorithm that learns a critic network that outputs Q-values over a sequence of actions, i.e., explicitly training the value function to learn the consequence of executing action sequences. Our experiments show that CQN-AS outperforms several baselines on a variety of sparse-reward humanoid control and tabletop manipulation tasks from BiGym and RLBench.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics