Learning Rewards from Linguistic Feedback

Theodore R. Sumers; Mark K. Ho; Robert D. Hawkins; Karthik Narasimhan,; Thomas L. Griffiths

arXiv:2009.14715·cs.AI·July 6, 2021

Learning Rewards from Linguistic Feedback

Theodore R. Sumers, Mark K. Ho, Robert D. Hawkins, Karthik Narasimhan,, Thomas L. Griffiths

PDF

1 Repo 1 Video

TL;DR

This paper introduces a framework for learning from unconstrained natural language feedback using sentiment analysis to infer rewards, demonstrating successful learning in human-robot interactions and outperforming some neural models.

Contribution

It presents a novel method that decomposes natural language feedback into sentiment features to infer reward functions, moving beyond command-based learning approaches.

Findings

01

Sentiment-based models outperform neural inference networks.

02

Pragmatic sentiment model approaches human-level performance.

03

All models successfully learn from human linguistic feedback.

Abstract

We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied language to teach, yet most prior work on interactive learning from language assumes a particular form of input (e.g., commands). We propose a general framework which does not make this assumption, using aspect-based sentiment analysis to decompose feedback into sentiment about the features of a Markov decision process. We then perform an analogue of inverse reinforcement learning, regressing the sentiment on the features to infer the teacher's latent reward function. To evaluate our approach, we first collect a corpus of teaching behavior in a cooperative task where both teacher and learner are human. We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tsumers/rewards
noneOfficial

Videos

Learning Rewards from Linguistic Feedback· underline