Improving Interactive Reinforcement Agent Planning with Human Demonstration
Guangliang Li, Randy Gomez, Keisuke Nakamura, Jinying Lin, Qilei, Zhang, Bo He

TL;DR
This paper enhances interactive reinforcement learning by initializing agents' reward functions through demonstration, enabling more efficient exploration and faster convergence, especially in physical robot applications.
Contribution
It introduces a method combining inverse reinforcement learning from demonstration with TAMER to improve exploration efficiency and reduce learning costs in reinforcement learning agents.
Findings
Learning from demonstration enables near-optimal policy acquisition.
The method reduces total human feedback required.
Agents explore along the optimal path more effectively.
Abstract
TAMER has proven to be a powerful interactive reinforcement learning method for allowing ordinary people to teach and personalize autonomous agents' behavior by providing evaluative feedback. However, a TAMER agent planning with UCT---a Monte Carlo Tree Search strategy, can only update states along its path and might induce high learning cost especially for a physical robot. In this paper, we propose to drive the agent's exploration along the optimal path and reduce the learning cost by initializing the agent's reward function via inverse reinforcement learning from demonstration. We test our proposed method in the RL benchmark domain---Grid World---with different discounts on human reward. Our results show that learning from demonstration can allow a TAMER agent to learn a roughly optimal policy up to the deepest search and encourage the agent to explore along the optimal path. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robot Manipulation and Learning
