TL;DR
GCImOpt introduces a data-efficient method for learning goal-conditioned policies by generating high-quality, optimal trajectories through fast trajectory optimization, enabling deployment on resource-limited systems.
Contribution
The paper presents a novel approach to generate large, high-quality datasets of optimal trajectories efficiently, improving goal-conditioned policy learning without requiring suboptimal demonstrations.
Findings
Generated thousands of optimal trajectories in minutes on a laptop.
Trained policies achieve high success rates and near-optimal control.
Policies are small, fast, and suitable for onboard deployment.
Abstract
Imitation learning is a well-established approach for machine-learning-based control. However, its applicability depends on having access to demonstrations, which are often expensive to collect and/or suboptimal for solving the task. In this work, we present GCImOpt, an approach to learn efficient goal-conditioned policies by training on datasets generated by trajectory optimization. Our approach for dataset generation is computationally efficient, can generate thousands of optimal trajectories in minutes on a laptop computer, and produces high-quality demonstrations. Further, by means of a data augmentation scheme that treats intermediate states as goals, we are able to increase the training dataset size by an order of magnitude. Using our generated datasets, we train goal-conditioned neural network policies that can control the system towards arbitrary goals. To demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
