High Acceleration Reinforcement Learning for Real-World Juggling with   Binary Rewards

Kai Ploeger; Michael Lutter; Jan Peters

arXiv:2010.13483·cs.RO·November 3, 2020

High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards

Kai Ploeger, Michael Lutter, Jan Peters

PDF

TL;DR

This paper presents a novel reinforcement learning system enabling a robot to learn juggling with high acceleration using only 56 minutes of experience and binary rewards, emphasizing safety and sample efficiency.

Contribution

The work introduces a policy design and optimization approach tailored for high-acceleration tasks with binary rewards, demonstrating effective real-world learning for dynamic juggling.

Findings

01

Robot learned to juggle in 56 minutes

02

Achieved continuous juggling for 33 minutes

03

Demonstrated safety and efficiency in real-world learning

Abstract

Robots that can learn in the physical world will be important to en-able robots to escape their stiff and pre-programmed movements. For dynamic high-acceleration tasks, such as juggling, learning in the real-world is particularly challenging as one must push the limits of the robot and its actuation without harming the system, amplifying the necessity of sample efficiency and safety for robot learning algorithms. In contrast to prior work which mainly focuses on the learning algorithm, we propose a learning system, that directly incorporates these requirements in the design of the policy representation, initialization, and optimization. We demonstrate that this system enables the high-speed Barrett WAM manipulator to learn juggling two balls from 56 minutes of experience with a binary reward signal. The final policy juggles continuously for up to 33 minutes or about 4500 repeated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.