TL;DR
This paper explores applying soft Q-learning to real-world robotic manipulation, emphasizing its ability to learn expressive, multimodal policies and to compose existing skills efficiently, outperforming prior methods in sample efficiency.
Contribution
It demonstrates how soft Q-learning can be effectively used for real-world robotic tasks, highlighting its compositional capabilities and improved sample efficiency over previous approaches.
Findings
Soft Q-learning learns expressive, multimodal policies.
Policies can be composed to create new skills.
Method shows superior sample efficiency in real-world tasks.
Abstract
Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
