Learning to Generalize from Sparse and Underspecified Rewards

Rishabh Agarwal; Chen Liang; Dale Schuurmans; Mohammad Norouzi

arXiv:1902.07198·cs.LG·June 23, 2020·46 cites

Learning to Generalize from Sparse and Underspecified Rewards

Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi

PDF

Open Access 1 Repo

TL;DR

This paper introduces Meta Reward Learning (MeRL), a method that constructs auxiliary reward functions to improve learning from sparse, underspecified success-failure feedback, achieving state-of-the-art results in weakly-supervised semantic parsing.

Contribution

The paper proposes MeRL, a novel approach that optimizes auxiliary reward functions to enhance generalization and exploration in sparse reward settings, outperforming Bayesian Optimization-based methods.

Findings

01

MeRL outperforms Bayesian Optimization in reward learning.

02

Achieves state-of-the-art on WikiTableQuestions and WikiSQL.

03

Improves previous results by 1.2% and 2.4%.

Abstract

We consider the problem of learning from sparse and underspecified rewards, where an agent receives a complex input, such as a natural language instruction, and needs to generate a complex response, such as an action sequence, while only receiving binary success-failure feedback. Such success-failure rewards are often underspecified: they do not distinguish between purposeful and accidental success. Generalization from underspecified rewards hinges on discounting spurious trajectories that attain accidental success, while learning from sparse feedback requires effective exploration. We address exploration by using a mode covering direction of KL divergence to collect a diverse set of successful trajectories, followed by a mode seeking KL divergence to train a robust policy. We propose Meta Reward Learning (MeRL) to construct an auxiliary reward function that provides more refined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/google-research/tree/master/meta_reward_learning
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning

MethodsModel-Agnostic Meta-Learning · Meta Reward Learning