Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

Vihang P. Patil; Markus Hofmarcher; Marius-Constantin Dinu; Matthias; Dorfer; Patrick M. Blies; Johannes Brandstetter; Jose A. Arjona-Medina; Sepp; Hochreiter

arXiv:2009.14108·cs.LG·June 30, 2022

Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

Vihang P. Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias, Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose A. Arjona-Medina, Sepp, Hochreiter

PDF

1 Repo

TL;DR

Align-RUDDER enhances reinforcement learning from limited demonstrations by using sequence alignment for reward redistribution, significantly improving learning efficiency in complex tasks with sparse rewards.

Contribution

It introduces a novel reward redistribution method based on multiple sequence alignment, enabling effective learning from few demonstrations in complex hierarchical tasks.

Findings

01

Outperforms existing methods on artificial tasks with delayed rewards.

02

Successfully mines a diamond in Minecraft with few demonstrations.

03

Drastically improves learning efficiency in sparse reward scenarios.

Abstract

Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current exploration strategies cannot discover them in reasonable time. In this work, we introduce Align-RUDDER, which utilizes a profile model for reward redistribution that is obtained from multiple sequence alignment of demonstrations. Consequently, Align-RUDDER employs reward redistribution effectively and, thereby, drastically improves learning on few demonstrations. Align-RUDDER outperforms competitors on complex artificial tasks with delayed rewards and few demonstrations. On the Minecraft ObtainDiamond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ml-jku/align-rudder
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory