Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward

Jiarui Yang; Bin Zhu; Jingjing Chen; Yu-Gang Jiang

arXiv:2508.11143·cs.RO·March 2, 2026

Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward

Jiarui Yang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang

PDF

Open Access 1 Video

TL;DR

AC3 is a reinforcement learning framework that learns continuous action chunks for long-horizon robotic tasks, improving stability and data efficiency through novel stabilization and reward mechanisms.

Contribution

Introduces AC3, a new RL method for stable, data-efficient learning of continuous action sequences in robotic manipulation with sparse rewards.

Findings

01

Achieves higher success rates on benchmark tasks.

02

Uses few demonstrations with simple models.

03

Effective stabilization mechanisms improve learning stability.

Abstract

Existing reinforcement learning (RL) methods struggle with long-horizon robotic manipulation tasks, particularly those involving sparse rewards. While action chunking is a promising paradigm for robotic manipulation, using RL to directly learn continuous action chunks in a stable and data-efficient manner remains a critical challenge. This paper introduces AC3 (Actor-Critic for Continuous Chunks), a novel RL framework that learns to generate high-dimensional, continuous action sequences. To make this learning process stable and data-efficient, AC3 incorporates targeted stabilization mechanisms for both the actor and the critic. First, to ensure reliable policy improvement, the actor is trained with an asymmetric update rule, learning exclusively from successful trajectories. Second, to enable effective value learning despite sparse rewards, the critic's update is stabilized using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adaptive Dynamic Programming Control