Learning from Pixels with Expert Observations
Minh-Huy Hoang, Long Dinh, Hai Nguyen

TL;DR
This paper introduces a novel reinforcement learning method that leverages expert observations as intermediate visual goals to improve learning efficiency and performance in robot manipulation tasks with sparse rewards, reducing the need for costly expert actions.
Contribution
The approach uses expert observations as visual goals in goal-conditioned RL, significantly enhancing performance and reducing expert action requirements in manipulation tasks.
Findings
Improves performance of RL agents in block construction tasks.
Requires 4-20 times fewer expert actions during training.
Outperforms hierarchical baseline methods.
Abstract
In reinforcement learning (RL), sparse rewards can present a significant challenge. Fortunately, expert actions can be utilized to overcome this issue. However, acquiring explicit expert actions can be costly, and expert observations are often more readily available. This paper presents a new approach that uses expert observations for learning in robot manipulation tasks with sparse rewards from pixel observations. Specifically, our technique involves using expert observations as intermediate visual goals for a goal-conditioned RL agent, enabling it to complete a task by successively reaching a series of goals. We demonstrate the efficacy of our method in five challenging block construction tasks in simulation and show that when combined with two state-of-the-art agents, our approach can significantly improve their performance while requiring 4-20 times fewer expert actions during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications
