Linguistic communication as (inverse) reward design

Theodore R. Sumers; Robert D. Hawkins; Mark K. Ho; Thomas L.; Griffiths; Dylan Hadfield-Menell

arXiv:2204.05091·cs.AI·April 12, 2022·1 cites

Linguistic communication as (inverse) reward design

Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L., Griffiths, Dylan Hadfield-Menell

PDF

Open Access

TL;DR

This paper models natural language communication as inverse reward design, enabling autonomous agents to interpret instructions and descriptions by inferring underlying reward functions and reasoning horizons, improving alignment and robustness.

Contribution

It introduces a reward design framework for linguistic communication, incorporating reasoning about unknown future states and inferring speaker intent and horizon from language.

Findings

01

Short-horizon speakers use instructions.

02

Long-horizon speakers describe reward functions.

03

Inverse reward design improves alignment robustness.

Abstract

Natural language is an intuitive and expressive way to communicate reward information to autonomous agents. It encompasses everything from concrete instructions to abstract descriptions of the world. Despite this, natural language is often challenging to learn from: it is difficult for machine learning methods to make appropriate inferences from such a wide range of input. This paper proposes a generalization of reward design as a unifying principle to ground linguistic communication: speakers choose utterances to maximize expected rewards from the listener's future behaviors. We first extend reward design to incorporate reasoning about unknown future states in a linear bandit setting. We then define a speaker model which chooses utterances according to this objective. Simulations show that short-horizon speakers (reasoning primarily about a single, known state) tend to use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques