How to talk so AI will learn: Instructions, descriptions, and autonomy
Theodore R Sumers, Robert D Hawkins, Mark K Ho, Thomas L Griffiths,, Dylan Hadfield-Menell

TL;DR
This paper formalizes how humans communicate preferences to AI through instructions and descriptions, showing how agent autonomy influences optimal communication strategies and introducing pragmatic models that improve reward inference and learning.
Contribution
It introduces a formal framework for learning from language in AI, distinguishing instruction and description types, and demonstrates how agent autonomy affects communication effectiveness.
Findings
Instructions are optimal in low-autonomy settings.
Descriptions are better when agents act independently.
Pragmatic listener models improve reward inference from human language.
Abstract
From the earliest years of our lives, humans use language to express our beliefs and desires. Being able to talk to artificial agents about our preferences would thus fulfill a central goal of value alignment. Yet today, we lack computational models explaining such language use. To address this challenge, we formalize learning from language in a contextual bandit setting and ask how a human might communicate preferences over behaviors. We study two distinct types of language: , which provide information about the desired policy, and , which provide information about the reward function. We show that the agent's degree of autonomy determines which form of language is optimal: instructions are better in low-autonomy settings, but descriptions are better when the agent will need to act independently. We then define a pragmatic listener agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDecision-Making and Behavioral Economics · Misinformation and Its Impacts · Topic Modeling
