Efficient Imitation Without Demonstrations via Value-Penalized Auxiliary Control from Examples
Trevor Ablett, Bryan Chan, Jayce Haoran Wang, Jonathan Kelly

TL;DR
VPACE is a novel reinforcement learning algorithm that enhances exploration and learning efficiency by leveraging auxiliary tasks and value penalties from examples, outperforming traditional methods in robotic tasks.
Contribution
The paper introduces VPACE, a new method that improves sample efficiency in imitation learning without requiring full demonstrations or sparse rewards.
Findings
Significantly improves learning efficiency in robotic environments
Maintains bounded value estimates during training
Potentially more efficient than full-trajectory or sparse reward methods
Abstract
Common approaches to providing feedback in reinforcement learning are the use of hand-crafted rewards or full-trajectory expert demonstrations. Alternatively, one can use examples of completed tasks, but such an approach can be extremely sample inefficient. We introduce value-penalized auxiliary control from examples (VPACE), an algorithm that significantly improves exploration in example-based control by adding examples of simple auxiliary tasks and an above-success-level value penalty. Across both simulated and real robotic environments, we show that our approach substantially improves learning efficiency for challenging tasks, while maintaining bounded value estimates. Preliminary results also suggest that VPACE may learn more efficiently than the more common approaches of using full trajectories or true sparse rewards. Project site: https://papers.starslab.ca/vpace/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Logic, Reasoning, and Knowledge
