Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning
Jiajun Fan, He Ba, Xian Guo, Jianye Hao

TL;DR
Critic PI2 introduces a novel model-based reinforcement learning approach that combines trajectory optimization, deep actor-critic methods, and planning to excel in continuous control tasks like inverted pendulum, achieving state-of-the-art results.
Contribution
The paper presents Critic PI2, a new framework that integrates policy improvement with path integrals and deep actor-critic learning for continuous planning in control systems.
Findings
Achieved state-of-the-art performance in continuous control benchmarks.
Significantly improved sample efficiency and real-time performance.
Demonstrated effectiveness in inverted pendulum and similar systems.
Abstract
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods from AlphaGo to Muzero have enjoyed huge success in discrete domains, such as chess and Go. Unfortunately, in real-world applications like robot control and inverted pendulum, whose action space is normally continuous, those tree-based planning techniques will be struggling. To address those limitations, in this paper, we present a novel model-based reinforcement learning frameworks called Critic PI2, which combines the benefits from trajectory optimization, deep actor-critic learning, and model-based reinforcement learning. Our method is evaluated for inverted pendulum models with applicability to many continuous control systems. Extensive experiments demonstrate that Critic PI2 achieved a new state of the art in a range of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Robotic Path Planning Algorithms
MethodsResidual Connection · Convolution · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Monte-Carlo Tree Search · Batch Normalization · Prioritized Experience Replay · Residual Block · MuZero
