CACTO-BIC: Scalable Actor-Critic Learning via Biased Sampling and GPU-Accelerated Trajectory Optimization
Elisa Alboni, Pietro Noah Crestaz, Elias Fontanari, Andrea Del Prete

TL;DR
CACTO-BIC enhances scalable actor-critic learning by biased sampling and GPU acceleration, improving efficiency and applicability to high-dimensional systems in real-time control tasks.
Contribution
It introduces CACTO-BIC, a method that improves data efficiency and reduces computation time for scalable actor-critic learning using biased sampling and GPU acceleration.
Findings
Improved sample efficiency over CACTO.
Faster computation compared to prior methods.
Effective on high-dimensional systems like AlienGO.
Abstract
Trajectory Optimization (TO) and Reinforcement Learning (RL) offer complementary strengths for solving optimal control problems. TO efficiently computes locally optimal solutions but can struggle with non-convexity, while RL is more robust to non-convexity at the cost of significantly higher computational demands. CACTO (Continuous Actor-Critic with Trajectory Optimization) was introduced to combine these advantages by learning a warm-start policy that guides the TO solver towards low-cost trajectories. However, scalability remains a key limitation, as increasing system complexity significantly raises the computational cost of TO. This work introduces CACTO-BIC to address these challenges. CACTO-BIC improves data efficiency by biasing initial-state sampling leveraging a property of the value function associated with locally optimal policies; moreover, it reduces computation time by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Reinforcement Learning in Robotics · Spacecraft Dynamics and Control
